Can not import Dataiku dataset in Python recipe which is not set as input on 5.1.0

Options
casper
casper Registered Posts: 42 ✭✭✭✭

Hi there,

I am running Dataiku 5.1.0 and in the release notes of 5.1.0 it says:

  • It is now possible to use datasets in a Python or R recipe, even if they are not declared as inputs or outputs

Now I tried using this today on a Python 3.6 environment. In the Jupyter notebook, this seems to work just fine, but as soon as I saved it back to recipe and ran the recipte itself, it immediately gave me the error:

Job failed: Error in Python process: At line 21: <class 'Exception'>: Dataset jira_projects cannot be used : declare it as input or output of your recipe

Apparently it isn't functioning as expected.

Tagged:

Best Answer

Answers

  • casper
    casper Registered Posts: 42 ✭✭✭✭
    Options
    Thanks, this solved it.

    Where could I have found this information?
  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Options
    You can find reference documentation on our Dataiku API Dataset class here: https://doc.dataiku.com/dss/latest/python-api/datasets.html#dataiku.Dataset
  • idoiku
    idoiku Registered Posts: 2 ✭✭✭✭
    Options
    How is this solved, if you are running earlier DSS (5.0.2)?

    We are getting the same python exception. Specifically, a dataset created by me in Project-A is being used in a recipe in Project-B ... this dataset IS INCLUDED as an input to the python recipe, but we are getting:

    Job failed: Error in Python process: At line 21: : Dataset cannot be used : declare it as input or output of your recipe
  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Options
    This option was added in version 5.1 so you will need to upgrade to use it.
  • idoiku
    idoiku Registered Posts: 2 ✭✭✭✭
    Options
    Hi,

    My question is -- what workaround did user's find before this feature was added?

    Our IT will eventually upgrade DSS, but I am looking for an immediate workaround.

    Can anyone say that there is no other (pre 5.1.0) workaround?

    The error says "add the dataset as an input" ... which would seem to be a workaround -- but that is not working. Has anyone successfully "added dataset as input"?

    Is the fact that the my dataset is in another project a factor? If so, the brute force work around is to copy the dataset between projects, and then use the "local project copy" as an input ... but I don't want to provide these advice, if it simply won't work.

    Is the dataiku.dataset module source code available? If so, maybe I could create a local code snipet until our DSS was upgraded.

    regards,
    s.hotz
  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Options
    Hi,
    The recommended way to use datasets from other projects is to use the "Share" feature: https://doc.dataiku.com/dss/latest/security/exposed-objects.html#exposing-objects-between-projects. Once you have shared the datasets from project A to project B, you can add them to your Python recipes on project B as you would for a dataset within project B. There is only a slightly different syntax: Dataset(".dataset_name"). This is preferred to doing a copy of the datasets across project as:
    1. you avoid duplicating the data
    2. shared datasets point to the same location so are always in sync
    3. you maintain the full lineage of data across project (which you would lose if you do not declare a dataset as input in the recipe)
    This applies to 5.1 and before.
  • harikrishna
    harikrishna Registered Posts: 2
    Options

    i am also getting same issue where i need to add this snippet?

    import dataiku

    handle = dataiku.Folder("TokenAccessFolder")
    handle

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    Options

    Hi,


    You can use ignore_flow with a folder as well:

    handle = dataiku.Folder("<replace_with_folder_id>", ignore_flow=True)

    Thanks,

  • Jennnnnny
    Jennnnnny Registered Posts: 9
    Options

    Hi,

    It doesn't work for me, i got

    Job failed: Error in Python process: At line 20: <class 'TypeError'>: __init__() got an unexpected keyword argument 'ignor_flow'

Setup Info
    Tags
      Help me…