How to access global variables inside visual recipes?
I have a global variable declared like that
I want to use it inside the visula receipe like that
As you can see it is giving me error that `Unknown input column`.
Also I want to access nested variables, so I tried like
${visual_recipes_params.metric_selection.time_column}
Which is also not working.
How to access global variables and nested global variables inside visual recipes?
Best Answer
-
Manuel Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 193 ✭✭✭✭✭✭✭
The scenario that you describe is precisely what Dataiku Applications were created for:
- Packaging of a flow into an application, accessible to business users with a simple interface
- Enabling concurrent execution of the same flow with different parameters.
Rather than asking the users to map columns in the variable json (not really a business user interface), ask the users to map the columns in their own file (renaming the columns before uploading).
Complete this tutorial to understand how to build your application. It is really simple. https://academy.dataiku.com/dataiku-applications-tutorials-open
I hope this helps. Good luck
Answers
-
Manuel Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 193 ✭✭✭✭✭✭✭
Hi,
Variables expansion is not available everywhere, as explained in the documentation: "Some select configuration fields in the DSS interface can perform variables expansion using ${variable} syntax."
https://doc.dataiku.com/dss/latest/variables/index.html
What is your overall objective? It seems you are trying to define a generic recipe to work with tables with different schemas. There might be better capabilities to achieve your objective.
I hope this helps.
-
Ya. you are correct. The generic structure of the input data is the same but the column names can be different.
Can you tell me a little bit more about a better alternative than this `global variable` method?
-
Manuel Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 193 ✭✭✭✭✭✭✭
Hi,
It would still be good to understand what is the challenge you are trying to solve, to point you to the right solution.
If your need is akin to execute the same flow for different input datasets, then you should look into:
- Dataiku Applications: However, a constant schema is still a requirement. https://videos.dataiku.com/watch/GuevWgkMCMrmFXAUNnnDiw?
- Application as recipes, in which you package an entire flow as a visual recipe. https://videos.dataiku.com/watch/8feWuYFCQtkXHJ7AYa5vre?
I hope this helps.
Best regards
-
Ya sure let me describe my problem, I am creating a pipeline to process the input data.
(input data) --> process step 1 --> process step 2 --> process step 3 --> Final csv
The input data has the following format for example:
ID y x1 x2 x3 x4 class
1 2 1 5 6 6 'a'
1 2 3 4 6 6 'c'
2 4 2 4 1 2 'd'
This data will be used for linear regression.
So, there are multiple steps while processing like--
1. Selecting the required columns
2. Parsing the date column
3. Doing some Na filling and custom python recipes etc
Not the thing is the data headers can be different but it is always in this format
ID y x1 x2 x3 class
The names can be different like
ID --> vehicle registration number
x1 --> milage
x2 --> Fuel efficieny
y --> cost of servicing
or
ID --> person ID
x1 --> distance
x2 --> time
y --> toll cost
etc...
So I need to make the whole flow such that if I just modify the global variable file, everything will ve fine. To be more precise, this thing will be used by non-techy people, so I tried to limit them only to the global variables so that they don't need to change anything else in the flow. I hope this clarifies your questiong and you will be able to provide better solution.
-
Sorry to ask again one question, the data is not actually a csv file, it is residing in a database and renaming the column headers of the data is not possible. Like if the original dataframe has a column name "cat_age" after all the processing stuffs user wants some parameters related to "cat_age". According to your suggestion I can make the X variables as "feature_1", "feature_2" and so one but I don't know how many features will be there, it can be 10 or 11. What do you suggest in that case?