Can not import Dataiku dataset in Python recipe which is not set as input on 5.1.0
casper
Registered Posts: 42 ✭✭✭✭
Hi there,
I am running Dataiku 5.1.0 and in the release notes of 5.1.0 it says:
- It is now possible to use datasets in a Python or R recipe, even if they are not declared as inputs or outputs
Now I tried using this today on a Python 3.6 environment. In the Jupyter notebook, this seems to work just fine, but as soon as I saved it back to recipe and ran the recipte itself, it immediately gave me the error:
Job failed: Error in Python process: At line 21: <class 'Exception'>: Dataset jira_projects cannot be used : declare it as input or output of your recipe
Apparently it isn't functioning as expected.
Best Answer
-
Hi,
You need to add ignore_flow=True in the constructor of the Dataset() class
Answers
-
Thanks, this solved it.
Where could I have found this information? -
You can find reference documentation on our Dataiku API Dataset class here: https://doc.dataiku.com/dss/latest/python-api/datasets.html#dataiku.Dataset
-
How is this solved, if you are running earlier DSS (5.0.2)?
We are getting the same python exception. Specifically, a dataset created by me in Project-A is being used in a recipe in Project-B ... this dataset IS INCLUDED as an input to the python recipe, but we are getting:
Job failed: Error in Python process: At line 21: : Dataset cannot be used : declare it as input or output of your recipe -
This option was added in version 5.1 so you will need to upgrade to use it.
-
Hi,
My question is -- what workaround did user's find before this feature was added?
Our IT will eventually upgrade DSS, but I am looking for an immediate workaround.
Can anyone say that there is no other (pre 5.1.0) workaround?
The error says "add the dataset as an input" ... which would seem to be a workaround -- but that is not working. Has anyone successfully "added dataset as input"?
Is the fact that the my dataset is in another project a factor? If so, the brute force work around is to copy the dataset between projects, and then use the "local project copy" as an input ... but I don't want to provide these advice, if it simply won't work.
Is the dataiku.dataset module source code available? If so, maybe I could create a local code snipet until our DSS was upgraded.
regards,
s.hotz -
Hi,
The recommended way to use datasets from other projects is to use the "Share" feature: https://doc.dataiku.com/dss/latest/security/exposed-objects.html#exposing-objects-between-projects. Once you have shared the datasets from project A to project B, you can add them to your Python recipes on project B as you would for a dataset within project B. There is only a slightly different syntax: Dataset(".dataset_name"). This is preferred to doing a copy of the datasets across project as:
1. you avoid duplicating the data
2. shared datasets point to the same location so are always in sync
3. you maintain the full lineage of data across project (which you would lose if you do not declare a dataset as input in the recipe)
This applies to 5.1 and before. -
i am also getting same issue where i need to add this snippet?
import dataiku
handle = dataiku.Folder("TokenAccessFolder")
handle -
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,225 Dataiker
Hi,
You can use ignore_flow with a folder as well:handle = dataiku.Folder("<replace_with_folder_id>", ignore_flow=True)
Thanks,
-
Hi,
It doesn't work for me, i got
Job failed: Error in Python process: At line 20: <class 'TypeError'>: __init__() got an unexpected keyword argument 'ignor_flow'