Create a variable using values from a table

MassimoRighi96 · May 8

Hi everyone,

in my Dataiku flow I have a table (T-DATA-QUALITY-40) that gives me an output consisting of a row with 3 columns.

This is the output of the table

How can I create a new variable with the value of the second column (ANNO_CALENDARIO) ?

Thanks in advance!

Turribeach · May 8

Indeed there is a much better way of doing this and that's using scenario variables. You can see how to use this on the following documentation including how to fetch the result of a previous SQL query step:

https://doc.dataiku.com/dss/latest/scenarios/variables.html

Scenario variables can then be used further in the remaining steps assuring that the recipes will always use an updated value. And see below on how to use the variables in recipes:

https://knowledge.dataiku.com/latest/mlops-o16n/variables/concept-python-sql-recipes.html

https://knowledge.dataiku.com/latest/mlops-o16n/variables/tutorial-project-variables.html

Turribeach · May 8

What exactly are you trying to achieve? There might be a better way of doing it.

MassimoRighi96 · May 8

I need to find a way to save an output value (whether it is data/string/integer, it makes no difference) inside a variable.
This variable will then be used within the filter recipe in another flow.

I hope I explained myself.

Turribeach · May 8

It is always best to explain the outcome of what you are trying to achieve not how you think you can achieve it. Why can't you use a join recipe to use ANNO_CALENDARIO as filter as part of the join?

MassimoRighi96 · May 8

I should use this variable in various parts of the flow, both with the == command and >= command.
Isn't it computationally heavy to do all this with various joins instead of precomputing the variable just once and then using it in the filter recipe?

Turribeach · May 8

A join with a single row table will not add much to the overall query execution time. There are other ways of avoiding the join like using the WITH statement to create a variable with the value. The issue I see using project variable is how do you know the value is up-to-date? If you have a recipe that populates the project variable with the value, what stops you from running another recipe further down the flow with an outdated variable value?

MassimoRighi96 · May 8

I will be sure that the value of the variable will always be updated because the flow that updates the variable (the first screenshot I posted) will be launched first.

After which the second flow will be launched which will use the variables calculated previously.

Turribeach · May 8

You will be sure. What happens when someone else not aware of this design limitation runs the second flow without running the first? This sort of design pattern is the definition of technical debt. In any case you can create a variable using a Python recipe:

client = dataiku.api_client()
project_handle = client.get_project(dataiku.default_project_key())
vars = project_handle.get_variables()
vars['standard']['some_var_name'] = 'value'
project_handle.set_variables(vars)

You just need to add fetching the dataset and the row/column value that you want.

MassimoRighi96 · May 8

Ok I understand, what you are saying is clear to me.
Indeed this is a technical limitation of my solution.

If I wanted to overcome this technical debt without the join, is it possible? Maybe via scenario?

Thanks!

Create a variable using values from a table

Best Answer

Answers

Categories

Setup Info

Tags

Create a variable using values ​​from a table

Best Answer

Answers

Categories

Setup Info

Tags

Create a variable using values from a table