Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Im trying to build a piece of code that runs as a recipe. The code builds 2 datasets, one of which is a database connection, and then implements a sql query to build out the other. My issue is that when using the recipe I keep getting the message when I try to write to the dataset that it cant be found. Its obviously there because I see it in the flow from where the dataset was instantiated. Here is the basic code
# build database connection
dataset_params = {
"connection": connection,
"schema": schema_name,
"table": table_name
}
dataset_teradata = project.get_dataset(connection_dataset_name)
if not dataset_teradata.exists():
dataset_teradata = project.create_dataset(
dataset_name=connection_dataset_name,
type=type_database_connection,
params=dataset_params,
formatParams={}
)
# create csv dataset shell
path_to_training_full = project_id + '/' + training_dataset_name
params = {'connection': 'filesystem_managed', 'path': path_to_training_full}
format_params = {'separator': '\t', 'style': 'unix', 'compress': ''}
training_full = project.get_dataset(training_dataset_name)
if not training_full.exists():
training_full = project.create_dataset(training_dataset_name,
type='Filesystem',
params=params,
formatType='csv',
formatParams=format_params)
ds_def = training_full.get_definition()
ds_def['managed'] = True
training_full.set_definition(ds_def)
# get th sql code
sql_file_path = 'sql_file_name + '.sql'
fd = open(sql_file_path, 'r')
sqlFile = fd.read()
fd.close()
# use the sql code to make a pandas dataframe
executor = SQLExecutor2(dataset=dataiku.Dataset(connection_dataset_name, project_key=project_id, ignore_flow=True))
training_full_df = executor.query_to_df(sqlFile)
# Try to write the pandas dataframe to the csv dataset
dataset_training_full = dataiku.Dataset(training_dataset_name, project_key=project_id, ignore_flow=True)
dataset_training_full.write_with_schema(training_full_df)
the error:
Oops: an unexpected error occurred
Error in Python process: At line 145: <class 'Exception'>: Dataset TEST_PLUGIN_DEV_CJ.training_full cannot be used : declare it as input or output of your recipe
The funny thing is that if I run the recipe twice in a row... it works 😟 (and if I run it in a notebook, it works the first time)
Does anyone have experience trying to make datasets with code? Eventually I want this to be a plugin, so that's why I was trying to run it as a recipe. Maybe there is a better option?! 🤞
Thanks
CJ
Operating system used: Ubuntu
I am guessing that if you are creating datasets on the fly you might need to update the recipe via the API to include the new input/output. Have a look at the dataset API methods. This check is only enforced for recipes hence why it works on a Notebook.