Question on Datatest Ignore_flow in a recipe

Options
NN
NN Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 145 Neuron

Hi All,

I am on dss 8.0.1

I am trying to create a custom recipe and one of the steps of the recipe is to read a dataset from another project (project B).
I am trying to see if i can use the dataset from Project B in the recipe without exposing the dataset to Project A
(The dataset will be called directly in the python code and is not a Input Role).

i use a code like below.

df=dataiku.Dataset('datasetnmae',project_key='projectkey',ignore_flow=True).get_dataframe(infer_with_pandas=True)

For some reason this code works in a Notebook on project A.
But when run as a recipe in project A it fails with the error

com.dataiku.common.server.APIError$SerializedErrorException: Error in python process: At line 99: <class 'Exception'>: Unable to fetch schema for projectkey.datasetname: b'dataset does not exist: projectkey.datasetname'

If someone can suggest if this is feasible?

Best Answer

  • Katie
    Katie Dataiker, Registered, Product Ideas Manager Posts: 105 Dataiker
    Answer ✓
    Options

    Hi @NN
    ,

    The reason why this works in a notebook and not a recipe is because a recipe requires input datasets to be explicitly defined through the UI, not just in the code. So, if you try to reference a dataset in your code which is not set as an input, it will fail (hence the idea of the flow -- any inputs to a recipe should be clear visually).

    I would recommend first sharing the dataset from project B to project A, and then setting it as an input to the python recipe. Or, if you want to do this through the python API, the copy_to method of the Dataset class should help.

    Best,

    Katie

Answers

Setup Info
    Tags
      Help me…