Survey banner
Share your feedback on the Dataiku documentation with this 5 min survey. Thanks! TAKE THE SURVEY

Retrieving the value of a metric in a scenario or as project variable

Grixis6
Level 3
Retrieving the value of a metric in a scenario or as project variable

Hello everyone, I was scrolling in documentation about variable in scenario and I read that we can retrieve value from a check/metrics of a dataset into a scenario as a project variable.

I tested it but it's seems doesnt work and the text about that is very short. 

> how to call a metrics from a dataset in a scenario step for store it as a project variable ?

Has anyone used this feature before ? 

retrieve metric.PNG

By the way I dont get a second point, the description seems explain every check/metrics’ results for that dataset is stored as a variable by default ? And we can call it with : parseJson(stepOutput_the_metrics)[‘PROJ.computed’].results  for example. Never notice this point before.

9 Replies
ismayiltahirov
Dataiker

Hello, 

You can call a metric from data set or data folder with compute metrics or run checks scenario step.

https://doc.dataiku.com/dss/latest/scenarios/steps.html#compute-metrics 
https://doc.dataiku.com/dss/latest/scenarios/steps.html#run-checks

Then you can use the value of  the stepOutput_<stepname>  in the next  Set Project variable or Define Scenario Variable step 

https://doc.dataiku.com/dss/latest/scenarios/steps.html#define-variables-set-project-variables-set-g... 

The json value of compute metric step output  parsing example is given here

https://doc.dataiku.com/dss/latest/scenarios/variables.html#retrieving-the-message-of-a-check 

You can also refer to the DSS formula language reference for filtering and parsing JSON objects

https://doc.dataiku.com/dss/latest/formula/index.html#control-structures 
https://doc.dataiku.com/dss/latest/formula/index.html#object-functions 

Also the below community post may be helpful for you. You can find the example of retrieving files_count metric and setting it to the scenario variable. 

https://community.dataiku.com/t5/Using-Dataiku/Conditional-execute-of-scenario-step-without-steps-fa... 

 

Jason
Level 4

I'm definitely NOT doing it the way the documentation says to do it, but I'm using a python script to toss out partitions in a partitioned dataset that failed to achieve a significant amount of training data.  To do this, I compute the metrics on the final partition that would lead to training, and then - as part of the scenario - provide only those partitions to the model training.  Most of that is not pertinent to your question, but in the snippet below you can see how I access the metrics to make decisions for the scenario.  There seems to be quite a bit of info in the computed metrics, so I distilled it down to just what I needed with a dict comprehension, but for your purposes it would probably be good to spend some time familiarizing yourself with everything in there.

Good Luck, and good hunting!

-J

 

client = dataiku.api_client()
p = client.get_project("project_name_here")
d = p.get_dataset("your_dataset_here")
    for t in block_targets:  # targets are the partitions
        computed_metrics = d.compute_metrics(str(t)) # partition can be omitted
        metrics = {x["metricId"]: x["value"] for x in computed_metrics["result"]["computed"]}
        print("For partition " + str(t) + " there are " + str(metrics["records:COUNT_RECORDS"]) + " records.")
        if "records:COUNT_RECORDS" not in metrics:
            print("Rejecting partition " + str(t) + " due to missing COUNT_RECORDS")
            # rejected_partitions.append(t)
            continue
        if int(metrics["records:COUNT_RECORDS"]) < v["minimum_training_support"]:  # project variable controls minimum size
            print("Rejecting partition " + str(t) + " due to low COUNT_RECORDS (" + str(v["minimum_training_support"]) + ")")
            # rejected_partitions.append(t)
            continue
    valid_partitions.append(t)  # these partitions will proceed to modeling

 

Turribeach

You are in luck as I recently worked on this and even posted about it! You don't need to set the value in a project variable, you can use scenario variables which are more relevant to run time data. Also you don't need Python for this, it can be done without Python!

So start by reading this post:

https://community.dataiku.com/t5/Using-Dataiku/Conditional-execute-of-scenario-step-without-steps-fa...

So my formula looks a bit complicated:

toNumber(filter(parseJson(stepOutput_Compute_Metrics)['ProjectID.FolderID_NP']['computed'], x, x["metricId"]=="basic:COUNT_FILES")[0].value)

But if you want to understand how this works do this. Create a scenario variable and assign the value as: parseJson(stepOutput_Compute_Metrics)

Compute_Metrics should be the name of the scenario step where you calculated metrics (can't have spaces). Then run the scenario step and look at the scenario log. You will have a line in the step log that says "Evaluated to" and next to it the full JSON that comes from your Compute_Metrics step in {} followed by (class org.json.JSONObject). Copy the whole JSON  including the {} and jump into any prepare recipe, create a new step, select formula as the processor and click on Open Edit Panel. Now write this as the formula: parseJson("{put the JSON you got from the step log execution here}") (add double quotes around the JSON).

Now you have a quick way of interacting with the JSON and see how you can extract the value you need. I also used this site to look at the JSON in a formated way: http://json.parser.online.fr/

So the toNumber() in formula is used to convert the data type to a number. The filter() function is used to find a attribute by ID as the JSON structure can change. And the rest is simple JSON dictionary referencing.

Enjoy!

 

Turribeach

PS: I got this solution with ismayiltahirov's help in a support case we worked together. Note that you can also retrieve data from a SQL statement as well. See this documentation link:

https://doc.dataiku.com/dss/latest/scenarios/variables.html#using-the-results-of-a-previous-sql-step

 

 

0 Kudos
tgb417

@Turribeach ,

I'm looking for other values created by a compute metric step.  How does one find the values in that JSON object?

--Tom
0 Kudos
Turribeach

Hi Tom, te easiest way is to follow the steps I posted above and extract the whole JSON so you can play with it on Jupyter notebook and work out the exact values you want to extract. But re-reading my post I take that it's not so easy to follow. I will try to post some screen shots tomorrow so that it's easier to see how to extract the whole JSON.

tgb417

With the help of the Dataiku Support team:

I discovered one of my problems. I was not using the correct dataset name.

If your dataset is non-partitioned you have to post-pend an "_NP" to what you think the data set name is.  In a quick search this does not really seem to be documented anywhere.

I'm now at least able to save metric variables to the project variables.

So a step forward here.

Still would like to see how you take a look at the values.  I did discover a method to save all of the values to the project variables.

parseJson(stepOutput_Compute_Metrics)['PROJECT_NAME.DATASET_NAME_NP'].computed

However, that is a bit messy.

--Tom
0 Kudos
tgb417

@Turribeach ,

Just posted this product idea.

https://community.dataiku.com/t5/Product-Ideas/Improved-UX-for-Senario-Variables-Setup/idi-p/30707 

Please feel free up vote and comment on the idea.

Updated with new URL link. Thanks @Turribeach 

--Tom
0 Kudos
Turribeach

Hi Tom, the link to the Idea is not right, the correct link is this one: https://community.dataiku.com/t5/Product-Ideas/Improved-UX-for-Senario-Variables-Setup/idi-p/30707

 

0 Kudos