Can not import Dataiku dataset in Python recipe which is not set as input on 5.1.0

casper · March 2019

Hi there,

I am running Dataiku 5.1.0 and in the release notes of 5.1.0 it says:

It is now possible to use datasets in a Python or R recipe, even if they are not declared as inputs or outputs

Now I tried using this today on a Python 3.6 environment. In the Jupyter notebook, this seems to work just fine, but as soon as I saved it back to recipe and ran the recipte itself, it immediately gave me the error:

Job failed: Error in Python process: At line 21: <class 'Exception'>: Dataset jira_projects cannot be used : declare it as input or output of your recipe

Apparently it isn't functioning as expected.

Clément_Stenac · March 2019

Hi,

You need to add ignore_flow=True in the constructor of the Dataset() class

casper · March 2019

Thanks, this solved it.

Where could I have found this information?

Alex_Combessie · March 2019

You can find reference documentation on our Dataiku API Dataset class here: https://doc.dataiku.com/dss/latest/python-api/datasets.html#dataiku.Dataset

idoiku · April 2019

How is this solved, if you are running earlier DSS (5.0.2)?

We are getting the same python exception. Specifically, a dataset created by me in Project-A is being used in a recipe in Project-B ... this dataset IS INCLUDED as an input to the python recipe, but we are getting:

Job failed: Error in Python process: At line 21: : Dataset cannot be used : declare it as input or output of your recipe

Alex_Combessie · April 2019

This option was added in version 5.1 so you will need to upgrade to use it.

idoiku · April 2019

Hi,

My question is -- what workaround did user's find before this feature was added?

Our IT will eventually upgrade DSS, but I am looking for an immediate workaround.

Can anyone say that there is no other (pre 5.1.0) workaround?

The error says "add the dataset as an input" ... which would seem to be a workaround -- but that is not working. Has anyone successfully "added dataset as input"?

Is the fact that the my dataset is in another project a factor? If so, the brute force work around is to copy the dataset between projects, and then use the "local project copy" as an input ... but I don't want to provide these advice, if it simply won't work.

Is the dataiku.dataset module source code available? If so, maybe I could create a local code snipet until our DSS was upgraded.

regards,
s.hotz

Alex_Combessie · April 2019

Hi,
The recommended way to use datasets from other projects is to use the "Share" feature: https://doc.dataiku.com/dss/latest/security/exposed-objects.html#exposing-objects-between-projects. Once you have shared the datasets from project A to project B, you can add them to your Python recipes on project B as you would for a dataset within project B. There is only a slightly different syntax: Dataset(".dataset_name"). This is preferred to doing a copy of the datasets across project as:
1. you avoid duplicating the data
2. shared datasets point to the same location so are always in sync
3. you maintain the full lineage of data across project (which you would lose if you do not declare a dataset as input in the recipe)
This applies to 5.1 and before.

harikrishna · November 2022

i am also getting same issue where i need to add this snippet?

import dataiku

handle = dataiku.Folder("TokenAccessFolder")
handle

Alexandru · November 2022

Hi,

You can use ignore_flow with a folder as well:

handle = dataiku.Folder("<replace_with_folder_id>", ignore_flow=True)

Thanks,

Jennnnnny · January 2023

Hi,

It doesn't work for me, i got

Can not import Dataiku dataset in Python recipe which is not set as input on 5.1.0

Best Answer

Answers

Job failed: Error in Python process: At line 20: <class 'TypeError'>: init() got an unexpected keyword argument 'ignor_flow'

Categories

Setup Info

Tags

Can not import Dataiku dataset in Python recipe which is not set as input on 5.1.0

Best Answer

Answers

Job failed: Error in Python process: At line 20: <class 'TypeError'>: __init__() got an unexpected keyword argument 'ignor_flow'

Categories

Setup Info

Tags

Job failed: Error in Python process: At line 20: <class 'TypeError'>: init() got an unexpected keyword argument 'ignor_flow'