How does Dataiku handle scheduling of containers
jax79sg
Registered Posts: 25 ✭✭✭✭
Hi,
Say i have a Dataiku pipeline and there are 3 nodes running in parallel in the pipeline. Each node is performing some actions that requires a GPU each, this node is performed by submitting as a container job to Kubernetes cluster.
Now while the nodes are initializing, one of them gets a FailedScheduling from Kubernetes cluster due to lack of GPU resources, can i configure Dataiku such that the pipeline doesn't fails? Rather, i would want the job in this node to be queued and somehow allow the user to see that this is happening.
Thank you.
Regards,
Jax
Say i have a Dataiku pipeline and there are 3 nodes running in parallel in the pipeline. Each node is performing some actions that requires a GPU each, this node is performed by submitting as a container job to Kubernetes cluster.
Now while the nodes are initializing, one of them gets a FailedScheduling from Kubernetes cluster due to lack of GPU resources, can i configure Dataiku such that the pipeline doesn't fails? Rather, i would want the job in this node to be queued and somehow allow the user to see that this is happening.
Thank you.
Regards,
Jax
Tagged:
Answers
-
Hi,
There is a way to add limitations on parallel jobs in DSS.
If you check in Administration > Settings > Flow build:
In the additional limits, you can add a limit for tag/gpu to 2 for instance.
Then on the recipes from the flow, you can add the gpu tag on the recipes using gpus.If you do that, it would mean that you can't start more than 2 recipes using gpus at the same time in DSS.
Matt
-
Hi,
Is there a way to start and queue it? Or the user will need to try and retry till a GPU is made available?
Thank you.
Regards,
Kah Siong -
We don't have "add to queue" functionality but if you set these additionals limits in the settings and you start the job, it will be in a waiting mode and it's going to start only when it's ready: meaning less than X recipes are currently running with the tag parameters.
-
Actually, a second option would be to add several steps in the scenario to build each dataset sequentially. In a scenario each steps are running one after another so you can control in which order you want to update the full pipeline.