Scenario run time in automation is much longer than in design.

AnnaProba Registered Posts: 9


I have a scenario that runs about 35 scala-spark recipes spread over 10 scenario steps (some steps run only one recipe). The execution time of the scenario in our design node is about 1h20 while on the automation node, it ranges between 3h and 9h on different days. We have been trying to understand this difference for several weeks now and no one internally gave us a response.

We know it must have something to do with an occupation of the cluster because when a scenario in the automation node starts at 9h in the morning, the run times are much higher than when it runs at 22h at night. However, the scenario in the design node always runs for ~1h20 independently of the time of day.

Our dev ops tell us that the Spark cluster used for the calculation is the same for the automation and design and that the jobs use the same queuing system. Let's assume this is the case. We have also verified that the spark-submit settings of our recipes are unchanged in automation.

Then the difference must be caused by a configuration of Dataiku between design and automation.

What configuration parameters could cause such an enormous difference in the time of calculation? It can't be concurrency, because even single-recipe scenario steps turn longer. What are some other parameters we should look into?

Best Answer

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    Answer ✓


    We would need the whole job diagnostics from both to start your DSS admins should be able to help please share those in support ticket not on community directly.



Setup Info
      Help me…