Adding same recipe(filter) to multiple datasets at once
I have 100+ datasets and need to apply to same filter to return rows having ‘A’ value in a column named grade. Should I do it one by one by using a filter recipe? Is there a way I can make it at once?
Best Answer
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,128 Neuron
You can use the Dataiku Python API to create the recipes programmatically.
Answers
-
@Turribeach
Thank you for your answer. I wonder if I have to set all the outputs by myself or it can generate the output datasets through the code? -
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,128 Neuron
You can create the output at the same time you create the recipe, here is sample sample code:
And here is how you get a handle to the Dataiku API client and to a project:
-
@Turribeach
Thank you for your reply. I have question regarding the creating the outputs.I have to figure out how to get started. Let's say I have 122 datasets and I thought I had to do it like this.
1. pressing shift + dragging all the 122 datasets with mouse then click on Python in the Actions > Code recipes
2. setting one output(to set name, store into) as it's required.
3. then I create the rest of them in the code.
I'm using python inside DSS. Is it what I'm supposed to do? also how can I do the setting "where to store into"? all the input data are stored into snowflake and I want the output to be stored into snowflake as well.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,128 Neuron
If you are going to do this porgrammatically then you should go all the way, no half baked aproaches. If you have the 122 input datasets in your project already you can loop though each dataset like this
import dataiku client = dataiku.api_client() project = client.get_project("some project key") datasets = project.list_datasets() # Returns a list of DSSDatasetListItem for dataset in datasets: # Quick access to main information in the dataset list item print("Name: %s" % dataset.name) print("Type: %s" % dataset.type) print("Connection: %s" % dataset.connection) print("Tags: %s" % dataset.tags) # Returns a list of strings # You can also use the list item as a dict of all available dataset information print("Raw: %s" % dataset)
Then simple create a new recipe for each uisng them as input and creating a new output dynamically using some like this:
dataset_output_name = dataset.name + "_out"
-
Thank you for your help @Turribeach