'Edit in notebook' in python recipe on partitioned dataset

Options
rene
rene Dataiku DSS Core Designer, Registered Posts: 3

For an application we have a high priority on transparency and understandability of the code. For that reason we use the 'Edit in notebook' function on our python recipes so that new team members and others can easily skim through the code as if a notebook and study the outputs of each step.

However, on a partitioned dataset this does not work because then a huge dataframe containing the entire dataset will be loaded, instead of just the latest partition.

Any workarounds?

Best Answer

Answers

  • rene
    rene Dataiku DSS Core Designer, Registered Posts: 3
    Options

    Yes that's what I was looking for. I just directly use add_read_partitions('CURRENT_DAY') to just show the latest data in the notebook whenever it is ran.

    Thanks!

  • rene
    rene Dataiku DSS Core Designer, Registered Posts: 3
    Options

    Update: I initially accepted this as solution however the hasattr(dataiku, 'dku_flow_variables') does NOT work when you use the 'Edit in notebook' option on a python recipe. Even when running the notebook, it will still return True.

  • apfk
    apfk Registered Posts: 1
    Options

    Unsure if you're still looking, but we use the native dataiku in_ipython to check if something is running in the notebook or not.

    import dataiku
    dataiku.in_ipython

    I use it my code libraries, and it seems able to differentiate between when I'm running those library functions in my notebook vs running through in a scenario.

    Hope this helps!

Setup Info
    Tags
      Help me…