How to define the best Global limit for Jobs

Solved!
NN
How to define the best Global limit for Jobs

Hi Dataiku Team,
The default limit Global & Per job for concurrent activities is set by default to 5 for the instance.

https://doc.dataiku.com/dss/latest/flow/limits.html

I believe this means that in a single instance if  project-1 has 5 concurrent activities running , then the job on project-2 will have to wait for any one of the activity on project-1 to complete. (i hope i understand this correct)

My Question is that , is there a best practice for defining these limits ? 
i.e. if my instance is of X configuration then limits can be kept higher than 5 but should be less than 10.. something on these lines ?

Thanks.. 

1 Solution
sergeyd
Dataiker

Hi @NN 

There is no best practice to say as this is truly based on the type of the activities (one job can run 24 hours while the other 10 jobs can run in just 1 minute), system resources DSS instance has, etc.

So I would say that you can start increasing the global limit (I wouldn't recommend going higher than 20 maybe) and monitor the DSS load. Also, if needed, you can fine-tune the activities with user/recipeType/project: 

https://doc.dataiku.com/dss/latest/flow/limits.html#additional-limits

 

 

View solution in original post

0 Kudos
3 Replies
sergeyd
Dataiker

Hi @NN 

There is no best practice to say as this is truly based on the type of the activities (one job can run 24 hours while the other 10 jobs can run in just 1 minute), system resources DSS instance has, etc.

So I would say that you can start increasing the global limit (I wouldn't recommend going higher than 20 maybe) and monitor the DSS load. Also, if needed, you can fine-tune the activities with user/recipeType/project: 

https://doc.dataiku.com/dss/latest/flow/limits.html#additional-limits

 

 

0 Kudos
Marlan

Hi @NN,

This is a good question. We've wondered about this as well.

I thought I'd share our experience with setting a global limit not that it necessarily will be helpful but maybe serve as another reference point.

  • We had been running into issues with jobs waiting with the default 5 activity limit on our Dev (Design) instance
  • We do run a fair number of automation jobs on the Dev instance rather than on Automation instances (I'd prefer we not do that but nonetheless that is current state)
  • Most of the jobs are pushed down to a SQL database
  • Our Dev instance is fairly beefy including lots of memory (I'd have to look up the details)
  • We upped the limit to 10 and haven't had any noticeable problem since the change.

Marlan

NN
Author

Thankyou @sergeyd  & @Marlan  for your inputs.

I will try various limits and see which gives us the best usage.

0 Kudos