Create a variable using values from a table

MassimoRighi96 · May 2024

Hi everyone,

in my Dataiku flow I have a table (T-DATA-QUALITY-40) that gives me an output consisting of a row with 3 columns.

This is the output of the table

How can I create a new variable with the value of the second column (ANNO_CALENDARIO) ?

Thanks in advance!

Turribeach · May 2024

Indeed there is a much better way of doing this and that's using scenario variables. You can see how to use this on the following documentation including how to fetch the result of a previous SQL query step:

https://doc.dataiku.com/dss/latest/scenarios/variables.html

Scenario variables can then be used further in the remaining steps assuring that the recipes will always use an updated value. And see below on how to use the variables in recipes:

https://knowledge.dataiku.com/latest/mlops-o16n/variables/concept-python-sql-recipes.html

https://knowledge.dataiku.com/latest/mlops-o16n/variables/tutorial-project-variables.html

Turribeach · May 2024

What exactly are you trying to achieve? There might be a better way of doing it.

MassimoRighi96 · May 2024

I need to find a way to save an output value (whether it is data/string/integer, it makes no difference) inside a variable.
This variable will then be used within the filter recipe in another flow.

I hope I explained myself.

Turribeach · May 2024

It is always best to explain the outcome of what you are trying to achieve not how you think you can achieve it. Why can't you use a join recipe to use ANNO_CALENDARIO as filter as part of the join?

MassimoRighi96 · May 2024

I should use this variable in various parts of the flow, both with the == command and >= command.
Isn't it computationally heavy to do all this with various joins instead of precomputing the variable just once and then using it in the filter recipe?

Turribeach · May 2024

A join with a single row table will not add much to the overall query execution time. There are other ways of avoiding the join like using the WITH statement to create a variable with the value. The issue I see using project variable is how do you know the value is up-to-date? If you have a recipe that populates the project variable with the value, what stops you from running another recipe further down the flow with an outdated variable value?

MassimoRighi96 · May 2024

I will be sure that the value of the variable will always be updated because the flow that updates the variable (the first screenshot I posted) will be launched first.

After which the second flow will be launched which will use the variables calculated previously.

Turribeach · May 2024

You will be sure. What happens when someone else not aware of this design limitation runs the second flow without running the first? This sort of design pattern is the definition of technical debt. In any case you can create a variable using a Python recipe:

client = dataiku.api_client()
project_handle = client.get_project(dataiku.default_project_key())
vars = project_handle.get_variables()
vars['standard']['some_var_name'] = 'value'
project_handle.set_variables(vars)

You just need to add fetching the dataset and the row/column value that you want.

MassimoRighi96 · May 2024

Ok I understand, what you are saying is clear to me.
Indeed this is a technical limitation of my solution.

If I wanted to overcome this technical debt without the join, is it possible? Maybe via scenario?

Thanks!

Create a variable using values from a table

Welcome!

Best Answer

Answers

Welcome!

Welcome!

Quick Links

Categories

Sign up to take part

Create a variable using values ​​from a table

Welcome!

Best Answer

Answers

Welcome!

Welcome!

Quick Links

Categories

Create a variable using values from a table