How does Dataiku handle scheduling of containers

jax79sg
Level 2
How does Dataiku handle scheduling of containers
Hi,

Say i have a Dataiku pipeline and there are 3 nodes running in parallel in the pipeline. Each node is performing some actions that requires a GPU each, this node is performed by submitting as a container job to Kubernetes cluster.

Now while the nodes are initializing, one of them gets a FailedScheduling from Kubernetes cluster due to lack of GPU resources, can i configure Dataiku such that the pipeline doesn't fails? Rather, i would want the job in this node to be queued and somehow allow the user to see that this is happening.

Thank you.

Regards,

Jax
0 Kudos
4 Replies
Mattsco
Dataiker

Hi, 



There is a way to add limitations on parallel jobs in DSS. 



If you check in Administration > Settings > Flow build: 





In the additional limits, you can add a limit for tag/gpu to 2 for instance. 

Then on the recipes from the flow, you can add the gpu tag on the recipes using gpus.



If you do that, it would mean that you can't start more than 2 recipes using gpus at the same time in DSS. 



Matt



 

Mattsco
0 Kudos
jax79sg
Level 2
Author
Hi,

Is there a way to start and queue it? Or the user will need to try and retry till a GPU is made available?

Thank you.

Regards,
Kah Siong
0 Kudos
Mattsco
Dataiker
We don't have "add to queue" functionality but if you set these additionals limits in the settings and you start the job, it will be in a waiting mode and it's going to start only when it's ready: meaning less than X recipes are currently running with the tag parameters.
Mattsco
0 Kudos
Mattsco
Dataiker
Actually, a second option would be to add several steps in the scenario to build each dataset sequentially. In a scenario each steps are running one after another so you can control in which order you want to update the full pipeline.
Mattsco
0 Kudos

Labels

?
Labels (1)
A banner prompting to get Dataiku