Create a variable using values from a table
Hi everyone,
in my Dataiku flow I have a table (T-DATA-QUALITY-40) that gives me an output consisting of a row with 3 columns.
This is the output of the table
How can I create a new variable with the value of the second column (ANNO_CALENDARIO) ?
Thanks in advance!
Best Answer
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,054 Neuron
Indeed there is a much better way of doing this and that's using scenario variables. You can see how to use this on the following documentation including how to fetch the result of a previous SQL query step:
https://doc.dataiku.com/dss/latest/scenarios/variables.html
Scenario variables can then be used further in the remaining steps assuring that the recipes will always use an updated value. And see below on how to use the variables in recipes:
https://knowledge.dataiku.com/latest/mlops-o16n/variables/concept-python-sql-recipes.html
https://knowledge.dataiku.com/latest/mlops-o16n/variables/tutorial-project-variables.html
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,054 Neuron
What exactly are you trying to achieve? There might be a better way of doing it.
-
I need to find a way to save an output value (whether it is data/string/integer, it makes no difference) inside a variable.
This variable will then be used within the filter recipe in another flow.I hope I explained myself.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,054 Neuron
It is always best to explain the outcome of what you are trying to achieve not how you think you can achieve it. Why can't you use a join recipe to use ANNO_CALENDARIO as filter as part of the join?
-
I should use this variable in various parts of the flow, both with the == command and >= command.
Isn't it computationally heavy to do all this with various joins instead of precomputing the variable just once and then using it in the filter recipe? -
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,054 Neuron
A join with a single row table will not add much to the overall query execution time. There are other ways of avoiding the join like using the WITH statement to create a variable with the value. The issue I see using project variable is how do you know the value is up-to-date? If you have a recipe that populates the project variable with the value, what stops you from running another recipe further down the flow with an outdated variable value?
-
I will be sure that the value of the variable will always be updated because the flow that updates the variable (the first screenshot I posted) will be launched first.
After which the second flow will be launched which will use the variables calculated previously.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,054 Neuron
You will be sure. What happens when someone else not aware of this design limitation runs the second flow without running the first? This sort of design pattern is the definition of technical debt. In any case you can create a variable using a Python recipe:
client = dataiku.api_client() project_handle = client.get_project(dataiku.default_project_key()) vars = project_handle.get_variables() vars['standard']['some_var_name'] = 'value' project_handle.set_variables(vars)
You just need to add fetching the dataset and the row/column value that you want.
-
Ok I understand, what you are saying is clear to me.
Indeed this is a technical limitation of my solution.If I wanted to overcome this technical debt without the join, is it possible? Maybe via scenario?
Thanks!