We're excited to announce that we're launching the second installment of Dataiku Product Days Register Now

Python recipe to create a new Dataiku dataset

rs0105
Level 1
Python recipe to create a new Dataiku dataset

I would like to create dataiku dataset using python recipe code, without using creating them manually in the recipe. I am able to do it through the notebook in Dataiku but fail to do so via the recipe as it is giving me the following error:

Dataset ABC cannot be used : declare it as input or output of your recipe

I am using the following code to create dataiku dataset through Notebook.

client = dataiku.api_client()
project = client.get_project(dataiku.default_project_key())
project_variables = dataiku.get_custom_variables()
csv_dataset_name = 'ABC'
params = {'connection': 'xyz', 'path': project_variables['projectKey'] + '/' + csv_dataset_name}
format_params = {'separator': '\t', 'style': 'unix', 'compress': ''}
csv_dataset = project.create_dataset(csv_dataset_name, type='Filesystem', params=params, formatType='csv', formatParams=format_params)
ds_def = csv_dataset.get_definition()
ds_def['managed'] = True
csv_dataset.set_definition(ds_def)
output_file = csv_dataset_name
output_file = dataiku.Dataset(output_file) 
output_file.write_with_schema(output_file_df)

0 Kudos
2 Replies
Clément_Stenac
Dataiker
Dataiker

Hi,

Unfortunately, what you are trying to achieve is not possible. A recipe cannot "modify its own Flow".

In order to guarantee  consistency and isolation of jobs, each job runs on a consistent snapshot of the Flow. Adding datasets through the API does not them to the snapshot, so the recipe remains unaware that this dataset exists, and hence can't write into it.

What you can do instead is use a "Python code" step in a scenario. Scenarios do not run on a snapshot and hence, Python steps can create datasets and write into them.

Alternatively, you could have a scenario with:

* First, a Python step that creates the dataset
* Then a build step that runs the recipe

Please note that in this latter case, you will need to use:

dataset = dataiku.Dataset("my_new_dataset", ignore_flow=True)
# ignore_flow=True indicates that you accept to write in a dataset that is not an output of the recipe. It's only needed in recipes
0 Kudos
rs0105
Level 1
Author

Thanks for the reply.

I wanted to know what if I want to create a dataset having name in the format "ABCYYYYMMDDHHMMSS".

Thanks in advance.

0 Kudos
A banner prompting to get Dataiku DSS