Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

Adding same recipe(filter) to multiple datasets at once

Solved!
ShrimpMania
Level 1
Adding same recipe(filter) to multiple datasets at once

 I have 100+ datasets and need to apply to same filter to return rows having ‘A’ value in a column named grade. Should I do it one by one by using a filter recipe? Is there a way I can make it at once?

0 Kudos
1 Solution
Turribeach

You can use the Dataiku Python API to create the recipes programmatically. 

View solution in original post

0 Kudos
6 Replies
Turribeach

You can use the Dataiku Python API to create the recipes programmatically. 

0 Kudos
ShrimpMania
Level 1
Author

@Turribeach Thank you for your answer. I wonder if I have to set all the outputs by myself or it can generate the output datasets through the code?

0 Kudos

You can create the output at the same time you create the recipe, here is sample sample code:

https://developer.dataiku.com/latest/api-reference/python/projects.html#dataikuapi.dss.project.DSSPr...

And here is how you get a handle to the Dataiku API client and to a project:

https://developer.dataiku.com/latest/concepts-and-examples/projects.html#handling-an-existing-projec...

 

0 Kudos
ShrimpMania
Level 1
Author

@Turribeach Thank you for your reply. I have question regarding the creating the outputs.

I have to figure out how to get started. Let's say I have 122 datasets and I thought I had to do it like this.

 

1. pressing shift + dragging all the 122 datasets with mouse then click on Python in the Actions > Code recipes

2. setting one output(to set name, store into) as it's required.

3. then I create the rest of them in the code.

 

 I'm using python inside DSS. Is it what I'm supposed to do? also how can I do the setting "where to store into"? all the input data are stored into snowflake and I want the output to be stored into snowflake as well.

0 Kudos

If you are going to do this porgrammatically then you should go all the way, no half baked aproaches. If you have the 122 input datasets in your project already you can loop though each dataset like this

https://developer.dataiku.com/latest/concepts-and-examples/datasets/datasets-other.html#listing-data...

 

import dataiku
client = dataiku.api_client()

project = client.get_project("some project key")

datasets = project.list_datasets()
# Returns a list of DSSDatasetListItem

for dataset in datasets:
        # Quick access to main information in the dataset list item
        print("Name: %s" % dataset.name)
        print("Type: %s" % dataset.type)
        print("Connection: %s" % dataset.connection)
        print("Tags: %s" % dataset.tags) # Returns a list of strings

        # You can also use the list item as a dict of all available dataset information
        print("Raw: %s" % dataset)

 

 

Then simple create a new recipe for each uisng them as input and creating a new output dynamically using some like this:

 

dataset_output_name = dataset.name + "_out"
0 Kudos
ShrimpMania
Level 1
Author

Thank you for your help @Turribeach 

0 Kudos