The VM size for Design, Automation and API have different VM sizes and I wanted to know if we need to increase the capacity will scaling out the VM work using Azure Scale set out work for Dataiku?
The Design and Automation nodes do not scale the same as the API node
The API node is an horizontally-scalable highly-available service. To increase the load to serve, you can simply add more copies of the API node server. This can be done either manually, automatically using our API deployer on Kubernetes, or can also be done using Azure Scale Sets or AWS autoscaling groups or similar. However, in the case of using cloud-managed scaling capabilities, there are two things that you will need to handle yourself:
As a side note, we have found that only a small minority of customers actually need to automatically scale the API node. The API node has very high performance, ranging up to thousands of queries per second, and a small number of servers in order to ensure resilience is usually enough.
The Design and Automation node are not horizontally scalable. You cannot simply add another copy of the design node and get more power. This is because the Design and Automation node contain their local configuration, code, ... and are not replicable. Scalability for Design and Automation nodes is achieved by leveraging Dataiku's ability to use external computation engines, and have the computation engines autoscale. For example, your Design and Automation node can leverage Kubernetes clusters, both for in-memory and Spark-based workloads. The Kubernetes clusters can be setup. for autoscalling based on load.
We'd recommend that you discuss the matter further with your Dataiku Sales Engineer and/or Customer Success Manager, who can recommend architecture best practices for scalability.