Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
In order to define the proper sizing, there are a few things you need to keep in mind:
For RAM: how many users will be working on the platform? is it mostly for heavy python / R processing or data preparation workflows that will pushed down to SQL or hadoop / spark ?
For instance, a server with 64GB of RAM with 8 cores can accommodate ~5-10 data scientists doing machine learning analysis in memory or ~20-30 analysts doing mostly visual ETL with workloads pushed on the infrastructure (spark/hadoop/sql database). In terms of CPU, you can count 1 core per simultaneous active user.
For Storage: sizing depends on how much data you will be storing on the DSS machine filesystem (vs. on an external DB). Keep at least 100GB free for DSS data directory, config, libs etc...
As you can see on the AWS marketplace, the default machine is a m5.xlarge (16GB RAM, 4 CPU), which is quite small.
Hope this helps
A few useful links for AWS install: