Azure VM scale out

Solved!
Kman
Level 2
Azure VM scale out

Hi There,

The VM size for Design, Automation and API have different VM sizes and I wanted to know if we need to increase the capacity will scaling out the VM work using Azure Scale set out work for Dataiku?

Thanks..

0 Kudos
1 Solution
Clรฉment_Stenac

Hi,

The Design and Automation nodes do not scale the same as the API node

 

The API node is an horizontally-scalable highly-available service. To increase the load to serve, you can simply add more copies of the API node server. This can be done either manually, automatically using our API deployer on Kubernetes, or can also be done using Azure Scale Sets or AWS autoscaling groups or similar. However, in the case of using cloud-managed scaling capabilities, there are two things that you will need to handle yourself:

  • Each server in the scaleset (and hence, the image on which it is based) need to have the API node software, provision for it to start at boot, but also the actual API services, set to autostart
  • You will need to have a load balancer in front of your scaleset.

As a side note, we have found that only a small minority of customers actually need to automatically scale the API node. The API node has very high performance, ranging up to thousands of queries per second, and a small number of servers in order to ensure resilience is usually enough.

 

The Design and Automation node are not horizontally scalable. You cannot simply add another copy of the design node and get more power. This is because the Design and Automation node contain their local configuration, code, ... and are not replicable. Scalability for Design and Automation nodes is achieved by leveraging Dataiku's ability to use external computation engines, and have the computation engines autoscale. For example, your Design and Automation node can leverage Kubernetes clusters, both for in-memory and Spark-based workloads. The Kubernetes clusters can be setup. for autoscalling based on load.

We'd recommend that you discuss the matter further with your Dataiku Sales Engineer and/or Customer Success Manager, who can recommend architecture best practices for scalability.

View solution in original post

1 Reply
Clรฉment_Stenac

Hi,

The Design and Automation nodes do not scale the same as the API node

 

The API node is an horizontally-scalable highly-available service. To increase the load to serve, you can simply add more copies of the API node server. This can be done either manually, automatically using our API deployer on Kubernetes, or can also be done using Azure Scale Sets or AWS autoscaling groups or similar. However, in the case of using cloud-managed scaling capabilities, there are two things that you will need to handle yourself:

  • Each server in the scaleset (and hence, the image on which it is based) need to have the API node software, provision for it to start at boot, but also the actual API services, set to autostart
  • You will need to have a load balancer in front of your scaleset.

As a side note, we have found that only a small minority of customers actually need to automatically scale the API node. The API node has very high performance, ranging up to thousands of queries per second, and a small number of servers in order to ensure resilience is usually enough.

 

The Design and Automation node are not horizontally scalable. You cannot simply add another copy of the design node and get more power. This is because the Design and Automation node contain their local configuration, code, ... and are not replicable. Scalability for Design and Automation nodes is achieved by leveraging Dataiku's ability to use external computation engines, and have the computation engines autoscale. For example, your Design and Automation node can leverage Kubernetes clusters, both for in-memory and Spark-based workloads. The Kubernetes clusters can be setup. for autoscalling based on load.

We'd recommend that you discuss the matter further with your Dataiku Sales Engineer and/or Customer Success Manager, who can recommend architecture best practices for scalability.