Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi All,
I am on dss 8.0.1
I am trying to create a custom recipe and one of the steps of the recipe is to read a dataset from another project (project B).
I am trying to see if i can use the dataset from Project B in the recipe without exposing the dataset to Project A
(The dataset will be called directly in the python code and is not a Input Role).
i use a code like below.
df=dataiku.Dataset('datasetnmae',project_key='projectkey',ignore_flow=True).get_dataframe(infer_with_pandas=True)
For some reason this code works in a Notebook on project A.
But when run as a recipe in project A it fails with the error
com.dataiku.common.server.APIError$SerializedErrorException: Error in python process: At line 99: <class 'Exception'>: Unable to fetch schema for projectkey.datasetname: b'dataset does not exist: projectkey.datasetname'
If someone can suggest if this is feasible?
Hi @NN ,
The reason why this works in a notebook and not a recipe is because a recipe requires input datasets to be explicitly defined through the UI, not just in the code. So, if you try to reference a dataset in your code which is not set as an input, it will fail (hence the idea of the flow -- any inputs to a recipe should be clear visually).
I would recommend first sharing the dataset from project B to project A, and then setting it as an input to the python recipe. Or, if you want to do this through the python API, the copy_to method of the Dataset class should help.
Best,
Katie
Hi @NN ,
The reason why this works in a notebook and not a recipe is because a recipe requires input datasets to be explicitly defined through the UI, not just in the code. So, if you try to reference a dataset in your code which is not set as an input, it will fail (hence the idea of the flow -- any inputs to a recipe should be clear visually).
I would recommend first sharing the dataset from project B to project A, and then setting it as an input to the python recipe. Or, if you want to do this through the python API, the copy_to method of the Dataset class should help.
Best,
Katie
Thanks Katie .. It makes sense. Sharing the dataset definitely works fine.
I misunderstood the ignore_flow option.