Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

Python recipe to create a new Dataiku dataset

Level 2
Python recipe to create a new Dataiku dataset

I would like to create dataiku dataset using python recipe code, without using creating them manually in the recipe. I am able to do it through the notebook in Dataiku but fail to do so via the recipe as it is giving me the following error:

Dataset ABC cannot be used : declare it as input or output of your recipe

I am using the following code to create dataiku dataset through Notebook.

client = dataiku.api_client()
project = client.get_project(dataiku.default_project_key())
project_variables = dataiku.get_custom_variables()
csv_dataset_name = 'ABC'
params = {'connection': 'xyz', 'path': project_variables['projectKey'] + '/' + csv_dataset_name}
format_params = {'separator': '\t', 'style': 'unix', 'compress': ''}
csv_dataset = project.create_dataset(csv_dataset_name, type='Filesystem', params=params, formatType='csv', formatParams=format_params)
ds_def = csv_dataset.get_definition()
ds_def['managed'] = True
output_file = csv_dataset_name
output_file = dataiku.Dataset(output_file) 

0 Kudos
2 Replies


Unfortunately, what you are trying to achieve is not possible. A recipe cannot "modify its own Flow".

In order to guarantee  consistency and isolation of jobs, each job runs on a consistent snapshot of the Flow. Adding datasets through the API does not them to the snapshot, so the recipe remains unaware that this dataset exists, and hence can't write into it.

What you can do instead is use a "Python code" step in a scenario. Scenarios do not run on a snapshot and hence, Python steps can create datasets and write into them.

Alternatively, you could have a scenario with:

* First, a Python step that creates the dataset
* Then a build step that runs the recipe

Please note that in this latter case, you will need to use:

dataset = dataiku.Dataset("my_new_dataset", ignore_flow=True)
# ignore_flow=True indicates that you accept to write in a dataset that is not an output of the recipe. It's only needed in recipes
0 Kudos
Level 2

Thanks for the reply.

I wanted to know what if I want to create a dataset having name in the format "ABCYYYYMMDDHHMMSS".

Thanks in advance.

0 Kudos