Get acces to partition target within Python Recipe
Hi,
I would like to have access within my Python recipe to the partition that will be built. In fact, i would like to adapt my code considering the partiton targeted.
The recipe takes as an input a non-partitioned hdfs dataset and as an output a file-based partitioned hdfs dataset which partition is categorical.
I tried to use what is described there , but the function
dataset.get_write_partition()
didn't work for me.
Existing a way to accomplish what i want to ?
Thanks
PS : I'm using DSS 7.0
Best Answer
-
Hi @s-cordo
You can access the partition name you want to build using the dku_flow_variables Python dictionary that you can access using dataiku.dku_flow_variables.
In your example, as your partitioning dimension name is thematique_name, you should be able to access its value usingdataiku.dku_flow_variables["DKU_DST_thematique_name"]
dataset.get_write_partition() is deprecated, we'll update the link you shared, thanks for the heads up!
Have a great day!
Answers
-
Hi @dimitri
,Thanks for your answer.
However, I did not succeed to apply your solution.
²
Maybe
dataiku.dku_flow_variables["DKU_DST_thematique_name"]
doesn't exist for DSS 7.0 ?
I tried
dataiku.get_flow_variables()
that looked equivalent, but i got a None value even if my partition seems to be well-defined on the flow :
Thanks for your help
-
This is because you run the script from a notebook. Since the partition identifiers to build are configured on the recipe, they cannot be accessed from a notebook, and the dku_flow_variables dictionary is only defined when running from the recipe.
Note that both dataiku.dku_flow_variables and dataiku.get_flow_variables() will work and return the same result from the recipe, even with DSS 7.0.
Hope it helps!
-
Indeed, it worked like a charm inside my python recipe
Thanks a lot !