Effort to setup Dataiku on AWS
I am looking to setup Dataiku on AWS . Is there documentation available to Size and deploy the documentation ?
Answers
-
Hi,
You have the requirements to run on a linux server on this page.
Then, you can follow our standard linux install or use our pre-built Marketplace AMI for running DSS on AWS EC2.
-
Hi @Akaushal
In order to define the proper sizing, there are a few things you need to keep in mind:
For RAM: how many users will be working on the platform? is it mostly for heavy python / R processing or data preparation workflows that will pushed down to SQL or hadoop / spark ?
For instance, a server with 64GB of RAM with 8 cores can accommodate ~5-10 data scientists doing machine learning analysis in memory or ~20-30 analysts doing mostly visual ETL with workloads pushed on the infrastructure (spark/hadoop/sql database). In terms of CPU, you can count 1 core per simultaneous active user.
For Storage: sizing depends on how much data you will be storing on the DSS machine filesystem (vs. on an external DB). Keep at least 100GB free for DSS data directory, config, libs etc...
As you can see on the AWS marketplace, the default machine is a m5.xlarge (16GB RAM, 4 CPU), which is quite small.
Hope this helps
A few useful links for AWS install:
https://doc.dataiku.com/dss/latest/installation/other/aws.html
https://www.dataiku.com/product/get-started/aws/
https://aws.amazon.com/marketplace/pp/Dataiku-Dataiku-DSS/B017MTTNFO
-
Thanks for your help