Dataiku, AWS EKS and Fargate

Youri
Youri Registered Posts: 3 ✭✭✭✭

Some background: My company runs Dataiku on AWS EC2, with all compute carried out by a non-plugin EKS cluster. It still runs on vanilla EC2 autoscaling, (as it was provisioned in the early days of AWS EKS), the performance of which is horrible for the kind of usage patterns data scientists in Dataiku have. Seriously, I can not stress enough how useless traditional auto-scaling is for Kubernetes.

Realistically, that brings us to two options: AWS EKS Node Groups (with K8s autoscaler configured), or EKS Fargate for serverless pods, a feature recently made available by AWS. As sidenote: Using the Dataiku EKS plugin is not an option for our company.

Of course, not having to worry about autoscalers working, knowing a job will always be able to run, and being able to use AWS tags per kubernetes namespace (for internal billing purposes!) would be extremely powerful. If I get Fargate to work for Dataiku, I see no reason to even investigate EKS node groups.

In a small-scale test on our company infra, I configured fargate in EKS, and successfully got Dataiku to trigger creation of a pod in a namespace that is backed not by EC2 but by Fargate. I get a HTTP timeout when the Fargate pod tries to communicate back with Dataiku, however.

I'm going to play around with this a bit on my own AWS environment and will let you guys know what I discover.

Have any of you been playing around with Fargate combined with Dataiku so far? And if so, what have your findings been?

Tagged:

Answers

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker

    Hi Youri,

    Without presuming of what other community members may have to say, I can say that Dataiku itself has not tested EKS Fargate at the moment. The promise is indeed to be both fully compatible with K8S and fully elastic.

    A feedback we had from Fargate (regular Fargate, not EKS Fargate) was that container startup time was very high, possibly making it unsuitable for semi-interactive jobs like Dataiku recipes and notebooks.

    About your timeout issue, it's important to note that in the Dataiku containerization design, the pods need to be able to connect back to the DSS machine, on any port. You will likely need to adjust security rules on fargate or your VPC side to allow for this.

  • Youri
    Youri Registered Posts: 3 ✭✭✭✭

    A feedback we had from Fargate (regular Fargate, not EKS Fargate) was that container startup time was very high, possibly making it unsuitable for semi-interactive jobs like Dataiku recipes and notebooks.

    Hi Clement - This is what my limited attempts so far also showed ~ startup time of the pods I ran was were between 30 and 60 seconds, if I remember correctly.

    Thanks for your very quick reply - I'll follow up if I learn or test more!

Setup Info
    Tags
      Help me…