Scenario run time in automation is much longer than in design.
Hello,
I have a scenario that runs about 35 scala-spark recipes spread over 10 scenario steps (some steps run only one recipe). The execution time of the scenario in our design node is about 1h20 while on the automation node, it ranges between 3h and 9h on different days. We have been trying to understand this difference for several weeks now and no one internally gave us a response.
We know it must have something to do with an occupation of the cluster because when a scenario in the automation node starts at 9h in the morning, the run times are much higher than when it runs at 22h at night. However, the scenario in the design node always runs for ~1h20 independently of the time of day.
Our dev ops tell us that the Spark cluster used for the calculation is the same for the automation and design and that the jobs use the same queuing system. Let's assume this is the case. We have also verified that the spark-submit settings of our recipes are unchanged in automation.
Then the difference must be caused by a configuration of Dataiku between design and automation.
What configuration parameters could cause such an enormous difference in the time of calculation? It can't be concurrency, because even single-recipe scenario steps turn longer. What are some other parameters we should look into?
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,211 Dataiker
Hi,
We would need the whole job diagnostics from both to start your DSS admins should be able to help please share those in support ticket not on community directly.
Thanks
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,211 Dataiker
Hi @AnnaProba
,
It's difficult to speculate in such cases. I would suggest you share the the job/scenario diagnostics of the same job with the same underlying data from both the design and automation nodes.
https://doc.dataiku.com/dss/latest/troubleshooting/problems/job-fails.html#getting-a-job-diagnosis
Once you have both diagnostics please open a support ticket and share these so we can review.
https://doc.dataiku.com/dss/latest/troubleshooting/obtaining-support.html#editor-support-for-all-other-dataiku-customers
Thanks -
Hi @AlexT
,I don't have access to the job in automation to export the zip, but I have logs that come with a mail-reporter. Is it sufficient?
I will ask internally if I have a right to share it.
Thank you. -
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,877 Neuron
"It can't be concurrency, because even single-recipe scenario steps turn longer."
Even if single-recipe scenarios take longer the following settings could be impacting concurrency:
Administration => Settings => Flow Build => Concurrent activities
Also check JVM and memory settings on your instances:
https://doc.dataiku.com/dss/latest/operations/memory.html
https://doc.dataiku.com/dss/latest/installation/custom/advanced-java-customization.html
Finally you haven't given us any information about your server configurtion (CPU, RAM, disk, network, etc) which can have an impact on the execution of jobs.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,877 Neuron
You really need to speak with your Dataiku Administrator to be able to go through the settings togehter and verify everything is matching. It's easy for an Administrator to say "it's all configured the same". Ask for evidence, screen shots of GUI configuration screens, connections, etc. Ask them to share all the DSS config files under DSS_HOME/config.
-
Thank you, @AlexT
@Turribeach
, I will talk to the admins about the diagnostic export and we will do the support ticket. At least I know now where to look for answers. Thank you!