Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
project_name = "ABC_PEI_2023"
project = dataiku.api_client().get_project(project_name)
prefix = 'PATIENT_POOL'
dataset_names = [dataset['name'] for dataset in project.list_datasets() if dataset['name'].startswith(prefix)]
appended_data = pd.DataFrame()
for i in dataset_names:
print(i)
data = dataiku.Dataset(i).get_dataframe()
appended_data = appended_data.append(data,ignore_index = True)
# Write recipe outputs
MASTER_PATIENT_POOL = dataiku.Dataset("MASTER_PATIENT_POOL")
MASTER_PATIENT_POOL.write_with_schema(appended_data)
I am new to Dataiku. I am writing the above code to get an appended data frame by appending all the datasets (available in project:ABC_PEI_2023) whose dataset name starts with Prefix : "PATIENT_POOL".
For now I have 2 datasets as PATIENT_POOL_BRAND1 and PATIENT_POOL_BRAND2 ( In future, more datasets with the same prefix are expected) and these datasets created on dataiku are getting stored on snowflake.
Since there is a possibility of more datasets in future, I need every dataset to get appended through this Python recipe code even if those datasets are not declared as input to this recipe (For the testing purpose, only "PATIENT_POOL_BRAND1" is declared as the input to the below Python recipe)
When i am running the below code on the Jupyter notebook, this code is working fine, and all the datasets are getting appended. But when i write it back to recipe and run it, it gives an error that.
Error : Job failed: Error in python process: At line 25: <class 'Exception'>: Dataset PATIENT_POOL_BRAND2 cannot be used : declare it as input or output of your recipe
Please help me to resolve the issue
Hi, please edit your post and insert a code block (see screen shot below) and paste your code again so that code can be properly interpreted. As you know Python code enforces indentation so your code below can be copy/pasted.
The errr you get is expected. You can't use inputs that are not defined in your recipe. Also you can't dynamically edit your inputs using the API. The Stack recipe can "stack" datasets into a single output. You can add any datasets as an input, including searching for them. However this won't be dynamic. Where are these datasets stored on?