Running same set of recipes multiple times in parallel with different parameters
Hi Community,
In my dataset, I have a categorical column called customer_segment with 10 different possible values.
I wish to train 10 different models- one for each customer_segment using filtered records only for that particular segment. We have a data preparation recipe, which is just Python code. As each customer_segment is independent of the other, we want to be able to run the steps for data preparation, model training, and evaluation for each customer_segment in parallel, by passing a different value of customer_segment to the recipes each time.
Furthermore, for ease of maintenance, we do not wish to create 10 copies of the same code- for data preparation, training, and evaluation.
Is it possible to do so with a Flow?
I'm attaching a sample flow for illustrative purposes to help explain my question.
Operating system used: Red Hat Enterprise Linux
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi @pratikgujral-sf
,
If I understand your requirements, a partitioned model would essentially do what you are looking for.
You would partition the input dataset and train the partitioned model, the partition being customer_segment https://doc.dataiku.com/dss/latest/machine-learning/partitioned.html
If you wish to bundle flow and make it re-usable you can also look at app-as-recipe.
https://doc.dataiku.com/dss/8.0/applications/application-as-recipe.html
Thanks