new dataset write error
Im trying to build a piece of code that runs as a recipe. The code builds 2 datasets, one of which is a database connection, and then implements a sql query to build out the other. My issue is that when using the recipe I keep getting the message when I try to write to the dataset that it cant be found. Its obviously there because I see it in the flow from where the dataset was instantiated. Here is the basic code
# build database connection
dataset_params = {
"connection": connection,
"schema": schema_name,
"table": table_name
}
dataset_teradata = project.get_dataset(connection_dataset_name)
if not dataset_teradata.exists():
dataset_teradata = project.create_dataset(
dataset_name=connection_dataset_name,
type=type_database_connection,
params=dataset_params,
formatParams={}
)
# create csv dataset shell
path_to_training_full = project_id + '/' + training_dataset_name
params = {'connection': 'filesystem_managed', 'path': path_to_training_full}
format_params = {'separator': '\t', 'style': 'unix', 'compress': ''}
training_full = project.get_dataset(training_dataset_name)
if not training_full.exists():
training_full = project.create_dataset(training_dataset_name,
type='Filesystem',
params=params,
formatType='csv',
formatParams=format_params)
ds_def = training_full.get_definition()
ds_def['managed'] = True
training_full.set_definition(ds_def)
# get th sql code
sql_file_path = 'sql_file_name + '.sql'
fd = open(sql_file_path, 'r')
sqlFile = fd.read()
fd.close()
# use the sql code to make a pandas dataframe
executor = SQLExecutor2(dataset=dataiku.Dataset(connection_dataset_name, project_key=project_id, ignore_flow=True))
training_full_df = executor.query_to_df(sqlFile)
# Try to write the pandas dataframe to the csv dataset
dataset_training_full = dataiku.Dataset(training_dataset_name, project_key=project_id, ignore_flow=True)
dataset_training_full.write_with_schema(training_full_df)
the error:
Oops: an unexpected error occurred Error in Python process: At line 145: <class 'Exception'>: Dataset TEST_PLUGIN_DEV_CJ.training_full cannot be used : declare it as input or output of your recipe
The funny thing is that if I run the recipe twice in a row... it works
Does anyone have experience trying to make datasets with code? Eventually I want this to be a plugin, so that's why I was trying to run it as a recipe. Maybe there is a better option?!
Thanks
CJ
Operating system used: Ubuntu
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,590 NeuronI am guessing that if you are creating datasets on the fly you might need to update the recipe via the API to include the new input/output. Have a look at the dataset API methods. This check is only enforced for recipes hence why it works on a Notebook.