Error while using datasets which are not declared as inputs on Python recipe.

aaryasoman Registered Posts: 1
edited July 16 in General Discussion

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

project_name = "ABC_PEI_2023"

project = dataiku.api_client().get_project(project_name)

prefix = 'PATIENT_POOL'

dataset_names = [dataset['name'] for dataset in project.list_datasets() if dataset['name'].startswith(prefix)]

appended_data = pd.DataFrame()

for i in dataset_names:
    data = dataiku.Dataset(i).get_dataframe()
    appended_data = appended_data.append(data,ignore_index = True)

# Write recipe outputs

I am new to Dataiku. I am writing the above code to get an appended data frame by appending all the datasets (available in project:ABC_PEI_2023) whose dataset name starts with Prefix : "PATIENT_POOL".
For now I have 2 datasets as PATIENT_POOL_BRAND1 and PATIENT_POOL_BRAND2 ( In future, more datasets with the same prefix are expected) and these datasets created on dataiku are getting stored on snowflake.
Since there is a possibility of more datasets in future, I need every dataset to get appended through this Python recipe code even if those datasets are not declared as input to this recipe (For the testing purpose, only "PATIENT_POOL_BRAND1" is declared as the input to the below Python recipe)
When i am running the below code on the Jupyter notebook, this code is working fine, and all the datasets are getting appended. But when i write it back to recipe and run it, it gives an error that.

Error : Job failed: Error in python process: At line 25: <class 'Exception'>: Dataset PATIENT_POOL_BRAND2 cannot be used : declare it as input or output of your recipe

Please help me to resolve the issue



  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,724 Neuron

    Hi, please edit your post and insert a code block (see screen shot below) and paste your code again so that code can be properly interpreted. As you know Python code enforces indentation so your code below can be copy/pasted.

    The errr you get is expected. You can't use inputs that are not defined in your recipe. Also you can't dynamically edit your inputs using the API. The Stack recipe can "stack" datasets into a single output. You can add any datasets as an input, including searching for them. However this won't be dynamic. Where are these datasets stored on?

    Screenshot 2023-12-13 at 19.41.45.png

Setup Info
      Help me…