Can not import Dataiku dataset in Python recipe which is not set as input on 5.1.0

Solved!
casper
Level 2
Can not import Dataiku dataset in Python recipe which is not set as input on 5.1.0

Hi there,



I am running Dataiku 5.1.0 and in the release notes of 5.1.0 it says:



 




  • It is now possible to use datasets in a Python or R recipe, even if they are not declared as inputs or outputs



 



Now I tried using this today on a Python 3.6 environment. In the Jupyter notebook, this seems to work just fine, but as soon as I saved it back to recipe and ran the recipte itself, it immediately gave me the error:



Job failed: Error in Python process: At line 21: <class 'Exception'>: Dataset jira_projects cannot be used : declare it as input or output of your recipe



 



Apparently it isn't functioning as expected.

0 Kudos
1 Solution
Clรฉment_Stenac
Hi,

You need to add ignore_flow=True in the constructor of the Dataset() class

View solution in original post

0 Kudos
10 Replies
Clรฉment_Stenac
Hi,

You need to add ignore_flow=True in the constructor of the Dataset() class
0 Kudos
casper
Level 2
Author
Thanks, this solved it.

Where could I have found this information?
0 Kudos
Alex_Combessie
Dataiker Alumni
You can find reference documentation on our Dataiku API Dataset class here: https://doc.dataiku.com/dss/latest/python-api/datasets.html#dataiku.Dataset
0 Kudos
idoiku
Level 1
How is this solved, if you are running earlier DSS (5.0.2)?

We are getting the same python exception. Specifically, a dataset created by me in Project-A is being used in a recipe in Project-B ... this dataset IS INCLUDED as an input to the python recipe, but we are getting:

Job failed: Error in Python process: At line 21: : Dataset cannot be used : declare it as input or output of your recipe
0 Kudos
Alex_Combessie
Dataiker Alumni
This option was added in version 5.1 so you will need to upgrade to use it.
0 Kudos
idoiku
Level 1
Hi,

My question is -- what workaround did user's find before this feature was added?

Our IT will eventually upgrade DSS, but I am looking for an immediate workaround.

Can anyone say that there is no other (pre 5.1.0) workaround?

The error says "add the dataset as an input" ... which would seem to be a workaround -- but that is not working. Has anyone successfully "added dataset as input"?

Is the fact that the my dataset is in another project a factor? If so, the brute force work around is to copy the dataset between projects, and then use the "local project copy" as an input ... but I don't want to provide these advice, if it simply won't work.

Is the dataiku.dataset module source code available? If so, maybe I could create a local code snipet until our DSS was upgraded.

regards,
s.hotz
0 Kudos
Alex_Combessie
Dataiker Alumni
Hi,
The recommended way to use datasets from other projects is to use the "Share" feature: https://doc.dataiku.com/dss/latest/security/exposed-objects.html#exposing-objects-between-projects. Once you have shared the datasets from project A to project B, you can add them to your Python recipes on project B as you would for a dataset within project B. There is only a slightly different syntax: Dataset(".dataset_name"). This is preferred to doing a copy of the datasets across project as:
1. you avoid duplicating the data
2. shared datasets point to the same location so are always in sync
3. you maintain the full lineage of data across project (which you would lose if you do not declare a dataset as input in the recipe)
This applies to 5.1 and before.
0 Kudos
harikrishna
Level 1

i am also getting same issue where i need to add this snippet?

import dataiku

handle = dataiku.Folder("TokenAccessFolder")
handle

0 Kudos
AlexT
Dataiker

Hi,


You can use ignore_flow with a folder as well:

handle = dataiku.Folder("<replace_with_folder_id>", ignore_flow=True)

Thanks,

0 Kudos
Jennnnnny
Level 2

Hi,

It doesn't work for me, i got 

Job failed: Error in Python process: At line 20: <class 'TypeError'>: __init__() got an unexpected keyword argument 'ignor_flow'

0 Kudos