How to Run Daily-Parameterized Flows in Parallel Without Global Variable Conflicts in Dataiku?

Hello,
This is a question about executing the same Flow in parallel.
I have a Flow that is designed to process data on a daily basis, and several recipes within the Flow refer to a project-level global variable called Job_Date
.
Each daily run takes about 30 minutes, making it a long-running Flow.
Now, I have a requirement to process 90 days’ worth of data, and I’d like to reduce the total processing time.
I initially created a custom Python script scenario that loops through each day by updating ${Job_Date}
and executing the Flow sequentially—but this takes too long.
To speed things up, I considered creating a separate scenario that runs the Flow, and then triggering that scenario asynchronously in parallel batches of 5 days using a Python script.
However, I encountered a problem where the Job_Date
global variable gets overwritten by the last run, causing interference between the parallel executions.
Is there a way to run the Flow in parallel for each day without conflicts, even though it was originally designed to handle one day at a time using the global variable?
Thanks,
Sangcheul
Operating system used: RHEL
Operating system used: RHEL
Best Answer
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,591 Neuron
The first issue you have is that you shouldn't be using global variables for a runtime data driven parameter. These are meant for other use cases like setting environment specific configuration. You should be using scenario variables or scenario parameters. Both of these are only available at runtime so they will be left set once the scenario execution finishes.
However even using these variable types you can't execute your scenario in parallel. To get around that issue you will either need to redesign your flow to be able to work with multiple dates at the same time or use an Application as a recipe.
Answers
-
Hi Turribeash,
Following your suggestion, I will be redesigning the flow. I'm delighted to have gained an understanding of the different types of variables, their right usage, and the concept of application recipe, all thanks to your insights. Your help is greatly appreciated.Many thanks,
Sangcheul