Variable Usage in Partition's Custom Python Function

MarcioCoelho · January 2023

Hey everyone,

I have a recipe that connects two partitioned datasets.

In order to map which partitions to use from the input dataset, I'm using a python dependency function. In order for it to be as dynamic and practical as possible, I would like to use a global variable previously defined called threshold_date, but I can't seem to use it in the function.

I've tried using '${threshold_date}' but it returns the literal string ${threshold_date}.

I've also tried using the code approach like so:

import dataiku
date_val = dataiku.get_custom_variables()['threshold_date']

But I'm promptly greeted with the error No module named dataiku as you can see in the attached image.

So my question is - is there anyway I can use a variable in a python dependency function when mapping between partitioned datasets?

Thanks in advance.

Best regards,

Márcio Coelho

Operating system used: Windows

JordanB · January 2023

Hi @MarcioCoelho
,

Thank you for clarifying. Custom Python dependency can't use Dataiku APIs, so you can't read variables directly. Even if you read them from disk you will run into the same issues as when passing a variable directly, the list passed is not interpreted correctly.

We have logged this limitation and will look to resolve it in the future. In the meantime, you incorporate your variables and use set_write_partitions within your python recipe. Note, you will need to override the partiton setting in the recipe by adding the "ignore_flow=True" in every occurrence of "Dataset" in your python code. For example:

input_dataset = dataiku.Dataset("Menu_item", ignore_flow=True)
input_df = input_dataset.get_dataframe()
output = dataiku.Dataset("menu_item_partition", ignore_flow=True)

output.write_schema_from_dataframe(input_df)
output.set_write_partition("${Menu Category}")

JordanB · January 2023

Hi @MarcioCoelho
,

You should not need to use a python dependency function to define a variable as a partition identifier. What you can do instead is use your project global variable and select "Explicit values". Then, you will want to add your variable as ${variable}.

Screen Shot 2023-01-27 at 3.17.51 PM.png

You can also use a variable when partitioning the input dataset:

Screen Shot 2023-01-27 at 3.44.18 PM.png

Note, importing dataiku will not work in a python depency function, which is why you are seeing the module not found error.

Please give this a try and let me know if you run into any issues.

Thanks!

Jordan

MarcioCoelho · January 2023

Hey @JordanB
,

Thank you for your reply.

I might not have explained it properly, but my goal isn't to define a partition via a variable, but instead to use the variable in the middle calculations.

I noticed the import error, hence why I hoped there would be something similar to what you propose of using ${variable_value}.

Thanks.

MarcioCoelho · February 2023

Great, thanks for the help @JordanB
, and for taking notes for this to be implemented in the future.

I really liked your snippet and will use it in the future!

robinh12 · March 2023

Thank you for your clarification. Because custom Python dependencies cannot access Dataiku APIs, you cannot read variables directly. Even if you read them from disc, you will encounter the same problems as when passing a variable directly: the list passed is incorrectly interpreted.

Variable Usage in Partition's Custom Python Function

Best Answer

Answers

Categories

Setup Info

Tags