You now have until September 15th to submit your use case or success story to the 2022 Dataiku Frontrunner Awards!ENTER YOUR SUBMISSION

Retrieving the value of a metric in a scenario or as project variable

Grixis6
Level 3
Retrieving the value of a metric in a scenario or as project variable

Hello everyone, I was scrolling in documentation about variable in scenario and I read that we can retrieve value from a check/metrics of a dataset into a scenario as a project variable.

I tested it but it's seems doesnt work and the text about that is very short. 

> how to call a metrics from a dataset in a scenario step for store it as a project variable ?

Has anyone used this feature before ? 

retrieve metric.PNG

By the way I dont get a second point, the description seems explain every check/metrics’ results for that dataset is stored as a variable by default ? And we can call it with : parseJson(stepOutput_the_metrics)[‘PROJ.computed’].results  for example. Never notice this point before.

0 Kudos
4 Replies
ismayiltahirov
Dataiker
Dataiker

Hello, 

You can call a metric from data set or data folder with compute metrics or run checks scenario step.

https://doc.dataiku.com/dss/latest/scenarios/steps.html#compute-metrics 
https://doc.dataiku.com/dss/latest/scenarios/steps.html#run-checks

Then you can use the value of  the stepOutput_<stepname>  in the next  Set Project variable or Define Scenario Variable step 

https://doc.dataiku.com/dss/latest/scenarios/steps.html#define-variables-set-project-variables-set-g... 

The json value of compute metric step output  parsing example is given here

https://doc.dataiku.com/dss/latest/scenarios/variables.html#retrieving-the-message-of-a-check 

You can also refer to the DSS formula language reference for filtering and parsing JSON objects

https://doc.dataiku.com/dss/latest/formula/index.html#control-structures 
https://doc.dataiku.com/dss/latest/formula/index.html#object-functions 

Also the below community post may be helpful for you. You can find the example of retrieving files_count metric and setting it to the scenario variable. 

https://community.dataiku.com/t5/Using-Dataiku/Conditional-execute-of-scenario-step-without-steps-fa... 

 

Jason
Level 3

I'm definitely NOT doing it the way the documentation says to do it, but I'm using a python script to toss out partitions in a partitioned dataset that failed to achieve a significant amount of training data.  To do this, I compute the metrics on the final partition that would lead to training, and then - as part of the scenario - provide only those partitions to the model training.  Most of that is not pertinent to your question, but in the snippet below you can see how I access the metrics to make decisions for the scenario.  There seems to be quite a bit of info in the computed metrics, so I distilled it down to just what I needed with a dict comprehension, but for your purposes it would probably be good to spend some time familiarizing yourself with everything in there.

Good Luck, and good hunting!

-J

 

client = dataiku.api_client()
p = client.get_project("project_name_here")
d = p.get_dataset("your_dataset_here")
    for t in block_targets:  # targets are the partitions
        computed_metrics = d.compute_metrics(str(t)) # partition can be omitted
        metrics = {x["metricId"]: x["value"] for x in computed_metrics["result"]["computed"]}
        print("For partition " + str(t) + " there are " + str(metrics["records:COUNT_RECORDS"]) + " records.")
        if "records:COUNT_RECORDS" not in metrics:
            print("Rejecting partition " + str(t) + " due to missing COUNT_RECORDS")
            # rejected_partitions.append(t)
            continue
        if int(metrics["records:COUNT_RECORDS"]) < v["minimum_training_support"]:  # project variable controls minimum size
            print("Rejecting partition " + str(t) + " due to low COUNT_RECORDS (" + str(v["minimum_training_support"]) + ")")
            # rejected_partitions.append(t)
            continue
    valid_partitions.append(t)  # these partitions will proceed to modeling

 

Turribeach
Level 5

You are in luck as I recently worked on this and even posted about it! You don't need to set the value in a project variable, you can use scenario variables which are more relevant to run time data. Also you don't need Python for this, it can be done without Python!

So start by reading this post:

https://community.dataiku.com/t5/Using-Dataiku/Conditional-execute-of-scenario-step-without-steps-fa...

So my formula looks a bit complicated:

toNumber(filter(parseJson(stepOutput_Compute_Metrics)['ProjectID.FolderID_NP']['computed'], x, x["metricId"]=="basic:COUNT_FILES")[0].value)

But if you want to understand how this works do this. Create a scenario variable and assign the value as: parseJson(stepOutput_Compute_Metrics)

Compute_Metrics should be the name of the scenario step where you calculated metrics (can't have spaces). Then run the scenario step and look at the scenario log. You will have a line in the step log that says "Evaluated to" and next to it the full JSON that comes from your Compute_Metrics step in {} followed by (class org.json.JSONObject). Copy the whole JSON  including the {} and jump into any prepare recipe, create a new step, select formula as the processor and click on Open Edit Panel. Now write this as the formula: parseJson("{put the JSON you got from the step log execution here}") (add double quotes around the JSON).

Now you have a quick way of interacting with the JSON and see how you can extract the value you need. I also used this site to look at the JSON in a formated way: http://json.parser.online.fr/

So the toNumber() in formula is used to convert the data type to a number. The filter() function is used to find a attribute by ID as the JSON structure can change. And the rest is simple JSON dictionary referencing.

Enjoy!

 

Turribeach
Level 5

PS: I got this solution with ismayiltahirov's help in a support case we worked together. Note that you can also retrieve data from a SQL statement as well. See this documentation link:

https://doc.dataiku.com/dss/latest/scenarios/variables.html#using-the-results-of-a-previous-sql-step

 

 

0 Kudos