Error while using datasets which are not declared as inputs on Python recipe.
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE # -*- coding: utf-8 -*- import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu project_name = "ABC_PEI_2023" project = dataiku.api_client().get_project(project_name) prefix = 'PATIENT_POOL' dataset_names = [dataset['name'] for dataset in project.list_datasets() if dataset['name'].startswith(prefix)] appended_data = pd.DataFrame() for i in dataset_names: print(i) data = dataiku.Dataset(i).get_dataframe() appended_data = appended_data.append(data,ignore_index = True) # Write recipe outputs MASTER_PATIENT_POOL = dataiku.Dataset("MASTER_PATIENT_POOL") MASTER_PATIENT_POOL.write_with_schema(appended_data)
I am new to Dataiku. I am writing the above code to get an appended data frame by appending all the datasets (available in project:ABC_PEI_2023) whose dataset name starts with Prefix : "PATIENT_POOL".
For now I have 2 datasets as PATIENT_POOL_BRAND1 and PATIENT_POOL_BRAND2 ( In future, more datasets with the same prefix are expected) and these datasets created on dataiku are getting stored on snowflake.
Since there is a possibility of more datasets in future, I need every dataset to get appended through this Python recipe code even if those datasets are not declared as input to this recipe (For the testing purpose, only "PATIENT_POOL_BRAND1" is declared as the input to the below Python recipe)
When i am running the below code on the Jupyter notebook, this code is working fine, and all the datasets are getting appended. But when i write it back to recipe and run it, it gives an error that.
Error : Job failed: Error in python process: At line 25: <class 'Exception'>: Dataset PATIENT_POOL_BRAND2 cannot be used : declare it as input or output of your recipe
Please help me to resolve the issue
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,165 Neuron
Hi, please edit your post and insert a code block (see screen shot below) and paste your code again so that code can be properly interpreted. As you know Python code enforces indentation so your code below can be copy/pasted.
The errr you get is expected. You can't use inputs that are not defined in your recipe. Also you can't dynamically edit your inputs using the API. The Stack recipe can "stack" datasets into a single output. You can add any datasets as an input, including searching for them. However this won't be dynamic. Where are these datasets stored on?