Retrieving the value of a metric in a scenario or as project variable

Grixis6
Grixis6 Registered Posts: 15 ✭✭✭✭✭

Hello everyone, I was scrolling in documentation about variable in scenario and I read that we can retrieve value from a check/metrics of a dataset into a scenario as a project variable.

I tested it but it's seems doesnt work and the text about that is very short.

> how to call a metrics from a dataset in a scenario step for store it as a project variable ?

Has anyone used this feature before ? 

retrieve metric.PNG

By the way I dont get a second point, the description seems explain every check/metrics’ results for that dataset is stored as a variable by default ? And we can call it with : parseJson(stepOutput_the_metrics)[‘PROJ.computed’].results for example. Never notice this point before.

Answers

  • ismayiltahirov
    ismayiltahirov Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 4 Dataiker

    Hello,

    You can call a metric from data set or data folder with compute metrics or run checks scenario step.

    https://doc.dataiku.com/dss/latest/scenarios/steps.html#compute-metrics
    https://doc.dataiku.com/dss/latest/scenarios/steps.html#run-checks

    Then you can use the value of the stepOutput_<stepname> in the next Set Project variable or Define Scenario Variable step

    https://doc.dataiku.com/dss/latest/scenarios/steps.html#define-variables-set-project-variables-set-global-variables

    The json value of compute metric step output parsing example is given here

    https://doc.dataiku.com/dss/latest/scenarios/variables.html#retrieving-the-message-of-a-check

    You can also refer to the DSS formula language reference for filtering and parsing JSON objects

    https://doc.dataiku.com/dss/latest/formula/index.html#control-structures
    https://doc.dataiku.com/dss/latest/formula/index.html#object-functions

    Also the below community post may be helpful for you. You can find the example of retrieving files_count metric and setting it to the scenario variable.

    https://community.dataiku.com/t5/Using-Dataiku/Conditional-execute-of-scenario-step-without-steps-failing-or/td-p/26710

  • Jason
    Jason Registered Posts: 32 ✭✭✭✭✭
    edited July 17

    I'm definitely NOT doing it the way the documentation says to do it, but I'm using a python script to toss out partitions in a partitioned dataset that failed to achieve a significant amount of training data. To do this, I compute the metrics on the final partition that would lead to training, and then - as part of the scenario - provide only those partitions to the model training. Most of that is not pertinent to your question, but in the snippet below you can see how I access the metrics to make decisions for the scenario. There seems to be quite a bit of info in the computed metrics, so I distilled it down to just what I needed with a dict comprehension, but for your purposes it would probably be good to spend some time familiarizing yourself with everything in there.

    Good Luck, and good hunting!

    -J

    client = dataiku.api_client()
    p = client.get_project("project_name_here")
    d = p.get_dataset("your_dataset_here")
        for t in block_targets:  # targets are the partitions
            computed_metrics = d.compute_metrics(str(t)) # partition can be omitted
            metrics = {x["metricId"]: x["value"] for x in computed_metrics["result"]["computed"]}
            print("For partition " + str(t) + " there are " + str(metrics["records:COUNT_RECORDS"]) + " records.")
            if "records:COUNT_RECORDS" not in metrics:
                print("Rejecting partition " + str(t) + " due to missing COUNT_RECORDS")
                # rejected_partitions.append(t)
                continue
            if int(metrics["records:COUNT_RECORDS"]) < v["minimum_training_support"]:  # project variable controls minimum size
                print("Rejecting partition " + str(t) + " due to low COUNT_RECORDS (" + str(v["minimum_training_support"]) + ")")
                # rejected_partitions.append(t)
                continue
        valid_partitions.append(t)  # these partitions will proceed to modeling

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,139 Neuron

    You are in luck as I recently worked on this and even posted about it! You don't need to set the value in a project variable, you can use scenario variables which are more relevant to run time data. Also you don't need Python for this, it can be done without Python!

    So start by reading this post:

    https://community.dataiku.com/t5/Using-Dataiku/Conditional-execute-of-scenario-step-without-steps-failing-or/m-p/26710

    So my formula looks a bit complicated:

    toNumber(filter(parseJson(stepOutput_Compute_Metrics)['ProjectID.FolderID_NP']['computed'], x, x["metricId"]=="basic:COUNT_FILES")[0].value)

    But if you want to understand how this works do this. Create a scenario variable and assign the value as: parseJson(stepOutput_Compute_Metrics)

    Compute_Metrics should be the name of the scenario step where you calculated metrics (can't have spaces). Then run the scenario step and look at the scenario log. You will have a line in the step log that says "Evaluated to" and next to it the full JSON that comes from your Compute_Metrics step in {} followed by (class org.json.JSONObject). Copy the whole JSON including the {} and jump into any prepare recipe, create a new step, select formula as the processor and click on Open Edit Panel. Now write this as the formula: parseJson("{put the JSON you got from the step log execution here}") (add double quotes around the JSON).

    Now you have a quick way of interacting with the JSON and see how you can extract the value you need. I also used this site to look at the JSON in a formated way: http://json.parser.online.fr/

    So the toNumber() in formula is used to convert the data type to a number. The filter() function is used to find a attribute by ID as the JSON structure can change. And the rest is simple JSON dictionary referencing.

    Enjoy!

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,139 Neuron

    PS: I got this solution with ismayiltahirov's help in a support case we worked together. Note that you can also retrieve data from a SQL statement as well. See this documentation link:

    https://doc.dataiku.com/dss/latest/scenarios/variables.html#using-the-results-of-a-previous-sql-step

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    @Turribeach
    ,

    I'm looking for other values created by a compute metric step. How does one find the values in that JSON object?

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,139 Neuron

    Hi Tom, te easiest way is to follow the steps I posted above and extract the whole JSON so you can play with it on Jupyter notebook and work out the exact values you want to extract. But re-reading my post I take that it's not so easy to follow. I will try to post some screen shots tomorrow so that it's easier to see how to extract the whole JSON.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    With the help of the Dataiku Support team:

    I discovered one of my problems. I was not using the correct dataset name.

    If your dataset is non-partitioned you have to post-pend an "_NP" to what you think the data set name is. In a quick search this does not really seem to be documented anywhere.

    I'm now at least able to save metric variables to the project variables.

    So a step forward here.

    Still would like to see how you take a look at the values. I did discover a method to save all of the values to the project variables.

    parseJson(stepOutput_Compute_Metrics)['PROJECT_NAME.DATASET_NAME_NP'].computed

    However, that is a bit messy.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    @Turribeach
    ,

    Just posted this product idea.

    https://community.dataiku.com/t5/Product-Ideas/Improved-UX-for-Senario-Variables-Setup/idi-p/30707

    Please feel free up vote and comment on the idea.

    Updated with new URL link. Thanks @Turribeach

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,139 Neuron
Setup Info
    Tags
      Help me…