Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello,
I run a flow that takes as input a dataset that is split by a python recipe into training and validation sets.
Then the training and validation sets go through several recipes (custom or not) until the results are saved in a folder.
My question is, how can I automate a cross validation with my split recipe? For example, I would like to use different predefined seeds or splits and run the flow multiple times until all tests are completed. I am attaching a screenshot of my flow, with the split recipe on the left, the calculation recipes on top or not shown (in another views) and the last recipe to save the result in my folder.
Thank you for your help,
Maxence.
 
Hi Maxence,
You can define project variables (for your seed and number of splits in your case) and retrieve these variables in the python recipe using:
dataiku.api_client().get_project(dataiku.default_project_key()).get_variables()
Then you can create a python script (in a scenario for example) that will build the flow multiple times with different values for the variables with something like:
project = dataiku.api_client().get_project(dataiku.default_project_key())
dataset_to_build = project.get_dataset(YOUR_DATASET)
for seed in seeds:
project_variables = project.get_variables()
project_variables['standard']['seed'] = seed
project.set_variables(project_variables)
dataset_to_build.build(job_type='RECURSIVE_FORCED_BUILD')
Hi Maxence,
You can define project variables (for your seed and number of splits in your case) and retrieve these variables in the python recipe using:
dataiku.api_client().get_project(dataiku.default_project_key()).get_variables()
Then you can create a python script (in a scenario for example) that will build the flow multiple times with different values for the variables with something like:
project = dataiku.api_client().get_project(dataiku.default_project_key())
dataset_to_build = project.get_dataset(YOUR_DATASET)
for seed in seeds:
project_variables = project.get_variables()
project_variables['standard']['seed'] = seed
project.set_variables(project_variables)
dataset_to_build.build(job_type='RECURSIVE_FORCED_BUILD')
Hello Stan,
Thank you for your answer, this is working well !
That was all I needed to solve my problem.