Running same set of recipes multiple times in parallel with different parameters

pratikgujral-sf · March 2023

Hi Community,

In my dataset, I have a categorical column called customer_segment with 10 different possible values.

I wish to train 10 different models- one for each customer_segment using filtered records only for that particular segment. We have a data preparation recipe, which is just Python code. As each customer_segment is independent of the other, we want to be able to run the steps for data preparation, model training, and evaluation for each customer_segment in parallel, by passing a different value of customer_segment to the recipes each time.

Furthermore, for ease of maintenance, we do not wish to create 10 copies of the same code- for data preparation, training, and evaluation.

Is it possible to do so with a Flow?

I'm attaching a sample flow for illustrative purposes to help explain my question.

Sample Flow added for illustrative purposes.

Operating system used: Red Hat Enterprise Linux

Alexandru · March 2023

Hi @pratikgujral-sf
,
If I understand your requirements, a partitioned model would essentially do what you are looking for.
You would partition the input dataset and train the partitioned model, the partition being customer_segment https://doc.dataiku.com/dss/latest/machine-learning/partitioned.html

If you wish to bundle flow and make it re-usable you can also look at app-as-recipe.
https://doc.dataiku.com/dss/8.0/applications/application-as-recipe.html

Thanks

Running same set of recipes multiple times in parallel with different parameters

Best Answer

Categories

Setup Info

Tags