The Dataiku Frontrunner Awards have launched to recognize your achievements! SUBMIT YOUR ENTRY

Automation of the cross validation with several recipes to be executed

Solved!
MaxenceQueyrel
Level 1
Automation of the cross validation with several recipes to be executed

Hello,

I run a flow that takes as input a dataset that is split by a python recipe into training and validation sets.
Then the training and validation sets go through several recipes (custom or not) until the results are saved in a folder.
My question is, how can I automate a cross validation with my split recipe? For example, I would like to use different predefined seeds or splits and run the flow multiple times until all tests are completed. I am attaching a screenshot of my flow, with the split recipe on the left, the calculation recipes on top or not shown (in another views) and the last recipe to save the result in my folder.

Thank you for your help,

Maxence.

flow.png

1 Solution
StanG
Dataiker
Dataiker

Hi Maxence,
You can define project variables (for your seed and number of splits in your case) and retrieve these variables in the python recipe using:

dataiku.api_client().get_project(dataiku.default_project_key()).get_variables()

Then you can create a python script (in a scenario for example) that will build the flow multiple times with different values for the variables with something like:

project = dataiku.api_client().get_project(dataiku.default_project_key())
dataset_to_build = project.get_dataset(YOUR_DATASET)
for seed in seeds:
    project_variables = project.get_variables()
    project_variables['standard']['seed'] = seed
    project.set_variables(project_variables)
    dataset_to_build.build(job_type='RECURSIVE_FORCED_BUILD')



View solution in original post

2 Replies
StanG
Dataiker
Dataiker

Hi Maxence,
You can define project variables (for your seed and number of splits in your case) and retrieve these variables in the python recipe using:

dataiku.api_client().get_project(dataiku.default_project_key()).get_variables()

Then you can create a python script (in a scenario for example) that will build the flow multiple times with different values for the variables with something like:

project = dataiku.api_client().get_project(dataiku.default_project_key())
dataset_to_build = project.get_dataset(YOUR_DATASET)
for seed in seeds:
    project_variables = project.get_variables()
    project_variables['standard']['seed'] = seed
    project.set_variables(project_variables)
    dataset_to_build.build(job_type='RECURSIVE_FORCED_BUILD')



View solution in original post

MaxenceQueyrel
Level 1
Author

Hello Stan,

Thank you for your answer, this is working well !
That was all I needed to solve my problem.

0 Kudos
A banner prompting to get Dataiku DSS
Public