Get acces to partition target within Python Recipe

Options
s-cordo
s-cordo Registered Posts: 6 ✭✭✭✭
edited July 16 in Using Dataiku

Hi,

I would like to have access within my Python recipe to the partition that will be built. In fact, i would like to adapt my code considering the partiton targeted.

The recipe takes as an input a non-partitioned hdfs dataset and as an output a file-based partitioned hdfs dataset which partition is categorical.

I tried to use what is described there , but the function

dataset.get_write_partition()

didn't work for me.

Existing a way to accomplish what i want to ?

Thanks

PS : I'm using DSS 7.0 

dataiku_code_python_recipe.JPG

Best Answer

  • dimitri
    dimitri Dataiker, Product Ideas Manager Posts: 33 Dataiker
    edited July 17 Answer ✓
    Options

    Hi @s-cordo

    You can access the partition name you want to build using the dku_flow_variables Python dictionary that you can access using dataiku.dku_flow_variables.
    In your example, as your partitioning dimension name is thematique_name, you should be able to access its value using

    dataiku.dku_flow_variables["DKU_DST_thematique_name"]

    dataset.get_write_partition() is deprecated, we'll update the link you shared, thanks for the heads up!

    Have a great day!

Answers

  • s-cordo
    s-cordo Registered Posts: 6 ✭✭✭✭
    edited July 17
    Options

    Hi @dimitri
    ,

    Thanks for your answer.

    However, I did not succeed to apply your solution.

    dataiku_code_python_recipe_2.JPG²

    Maybe

    dataiku.dku_flow_variables["DKU_DST_thematique_name"]

    doesn't exist for DSS 7.0 ?

    I tried

    dataiku.get_flow_variables()

    that looked equivalent, but i got a None value even if my partition seems to be well-defined on the flow :

    dataiku_code_python_recipe_3.JPG

    Thanks for your help

  • dimitri
    dimitri Dataiker, Product Ideas Manager Posts: 33 Dataiker
    Options

    This is because you run the script from a notebook. Since the partition identifiers to build are configured on the recipe, they cannot be accessed from a notebook, and the dku_flow_variables dictionary is only defined when running from the recipe.

    Note that both dataiku.dku_flow_variables and dataiku.get_flow_variables() will work and return the same result from the recipe, even with DSS 7.0.

    Hope it helps!

  • s-cordo
    s-cordo Registered Posts: 6 ✭✭✭✭
    Options

    Indeed, it worked like a charm inside my python recipe

    Thanks a lot !

Setup Info
    Tags
      Help me…