new dataset write error
Im trying to build a piece of code that runs as a recipe. The code builds 2 datasets, one of which is a database connection, and then implements a sql query to build out the other. My issue is that when using the recipe I keep getting the message when I try to write to the dataset that it cant be found. Its obviously there because I see it in the flow from where the dataset was instantiated. Here is the basic code
# build database connection dataset_params = { "connection": connection, "schema": schema_name, "table": table_name } dataset_teradata = project.get_dataset(connection_dataset_name) if not dataset_teradata.exists(): dataset_teradata = project.create_dataset( dataset_name=connection_dataset_name, type=type_database_connection, params=dataset_params, formatParams={} ) # create csv dataset shell path_to_training_full = project_id + '/' + training_dataset_name params = {'connection': 'filesystem_managed', 'path': path_to_training_full} format_params = {'separator': '\t', 'style': 'unix', 'compress': ''} training_full = project.get_dataset(training_dataset_name) if not training_full.exists(): training_full = project.create_dataset(training_dataset_name, type='Filesystem', params=params, formatType='csv', formatParams=format_params) ds_def = training_full.get_definition() ds_def['managed'] = True training_full.set_definition(ds_def) # get th sql code sql_file_path = 'sql_file_name + '.sql' fd = open(sql_file_path, 'r') sqlFile = fd.read() fd.close() # use the sql code to make a pandas dataframe executor = SQLExecutor2(dataset=dataiku.Dataset(connection_dataset_name, project_key=project_id, ignore_flow=True)) training_full_df = executor.query_to_df(sqlFile) # Try to write the pandas dataframe to the csv dataset dataset_training_full = dataiku.Dataset(training_dataset_name, project_key=project_id, ignore_flow=True) dataset_training_full.write_with_schema(training_full_df)
the error:
Oops: an unexpected error occurred Error in Python process: At line 145: <class 'Exception'>: Dataset TEST_PLUGIN_DEV_CJ.training_full cannot be used : declare it as input or output of your recipe
The funny thing is that if I run the recipe twice in a row... it works
Does anyone have experience trying to make datasets with code? Eventually I want this to be a plugin, so that's why I was trying to run it as a recipe. Maybe there is a better option?!
Thanks
CJ
Operating system used: Ubuntu
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron
I am guessing that if you are creating datasets on the fly you might need to update the recipe via the API to include the new input/output. Have a look at the dataset API methods. This check is only enforced for recipes hence why it works on a Notebook.