How to access specific check results in a scenario

Marty
Marty Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 11 Partner
edited July 2024 in Using Dataiku

I have a simple dataset containing 1 row and about 20 columns. Each column represents a Boolean value of whether or not a particular threshold has been breached. If the result is true for any of the 20 columns, then a message is sent to a designated MS Teams Channel that a breach has occurred. I would like to supplement that message with an indication of the specific column(s) that had the breach(es).

My question is, how do I access the values of those specific columns and how do I just get those where the value is true (as in a breach occurred).

I currently have a custom Python step that dumps all the variables from the scenario. This is the code that does that:

from dataiku.scenario import Scenario
import json
print(json.dumps(Scenario().get_all_variables(), indent=2))

Here is a portion of the output produced:

{
    "CHECKS_PROJECT.Alerts_Dataset_NP": {
        "startTime": 1686615603874,
        "endTime": 1686615603930,
        "runs": [
            {
                "name": "Seg1_Corr_Breached",
                "partition": "NP"
            },
            {
                "name": "Seg2_Corr_Breached",
                "partition": "NP"
            },
        ],
        "results": [
            {
                "check": {
                    "metricId": "cell:Seg1_Corr_Breach:Cell_Value",
                    "values": [
                        "false"
                    ],
                    "type": "valueSet",
                    "meta": {
                        "name": "Value in set",
                        "label": "Seg1_Corr_Breached"
                    },
                    "computeOnBuildMode": "PARTITION"
                },
                "value": {
                    "outcome": "ERROR",
                    "message": "true not in "
                }
            },
            {
                "check": {
                    "metricId": "cell:Seg2_Corr_Breach:Cell_Value",
                    "values": [
                        "false"
                    ],
                    "type": "valueSet",
                    "meta": {
                        "name": "Value in set",
                        "label": "Seg2_Corr_Breached"
                    },
                    "computeOnBuildMode": "PARTITION"
                },
                "value": {
                    "outcome": "OK",
                    "message": "false"
                }
            }
        ]
    }
}

What I need to do is find every time the "outcome" is an "ERROR" and then get that check's "metricID". In the end I will have about 20 different metricIDs that I will want to search through for their outcomes and only transmit the names of those metricIDs that have an error. Right now, all I'm able to do is tranmit whether or not there is ANY error, I can't specify where the error is.

Thanks for the help!


Operating system used: Windows


Operating system used: Windows

Answers

  • Zach
    Zach Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 153 Dataiker
    edited July 2024

    Hi @martyg
    ,

    If my understanding is correct, you have a "Run checks" step in a scenario, and when the checks fail, you want to send the ID of each failed metric in a Microsoft Teams message.

    You can accomplish this by using a custom variable in your Microsoft Teams reporter.

    First, click "CREATE CUSTOM VARIABLES" in the reporter settings, and replace the code with the following code. This will create a custom variable called "${failedMetrics}" which will contain a list of failed metric IDs:

    import json
    
    # compute your additional variables from the list of report items 
    # and return them as a dictionary.
    def get_variables(items_json, scenario_run_json, step_run_output_json):
        step_run_output = json.loads(step_run_output_json)
        failed_metrics = []
    
        # Iterate through all steps in the scenario and find
        # failed metric IDs
        for step in step_run_output.values():
            for activity in step.values():
                results = activity.get("results")
                if not results:
                    continue
    
                for result in results:
                    check = result.get("check")
                    value =  result.get("value")
                    if not check or not value:
                        continue
    
                    metric_id = check.get("metricId")
                    outcome = value.get("outcome")
                    if not metric_id or not outcome:
                        continue
    
                    if outcome == "ERROR":
                        failed_metrics.append(metric_id)
        
        return {
            "failedMetrics": ", ".join(failed_metrics)
        }

    Now add the "${failedMetrics}" variable somewhere in your message template.

    3A800D1C-1F62-4EA7-A78C-F7201B2F3D79_1_201_a.jpeg

    Thanks,

    Zach

  • Marty
    Marty Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 11 Partner
    edited July 2024

    Hi Zach,

    Thanks so much for the response. Unfortunately, it won't work for me. We are using the online instance of Dataiku and I just confirmed with tech support that the option to create custom variables (which requires an admin of the instance to enable "Write unisoloted code") is not available for Cloud instances of Dataiku.

    So, I think I've gotten a way to use a Python step to get the information I need. I can put the values I need into a list called breach_list. Here is my code:

    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    
    metric_list = ['cell:Incr_Std_Dev_MAE_Breach:Cell_Value',
                   'cell:Std_Dev_MAE_Breach:Cell_Value',
                   'cell:Incr_Scorability_Breach:Cell_Value']
    
    dataset = dataiku.Dataset("Alerts_Dataset")
    metrics = dataset.get_last_metric_values()
    #ids = metrics.get_all_ids()
    ids = metric_list
    
    breach_list = []
    
    for id in ids:
        metric_ID = id
        metric_name = metrics.get_metric_by_id(id)
        for value in metric_name['lastValues']:
            if value['value'] == 'true':
                print("Metric", id, "has value:", value['value'])
                string = id
                parts = string.split(':')
                string2 = parts[1]
                parts2 = string2.rsplit('_', 1)[0]
                breach_list.append(parts2)
        
    print(breach_list)

    Now, how do I get that "breach_list" into a message. I've tried to do this both through another step (a "send message" step) and via the reporters interface in the scenario settings, but no matter what I do I can't seem to access the breach_list. Someone also suggested this site (https://doc.dataiku.com/dss/latest/scenarios/custom_scenarios.html#send-custom-reports) as a means to send a custom report. How do I set up the message channel so that it uses the same teams webhook I have set up in the reporter? Could that then send the values in that breached_list using Python instead of JSON?

    Thanks.

  • Zach
    Zach Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 153 Dataiker
    edited July 2024

    Hi @martyg
    ,

    Thank you for the details. I understand now that using custom variables isn't an option in your case.

    You can access the breach_list in other steps by saving it as a scenario variable. You can do this by adding the following code at the end of your Python step:

    from dataiku.scenario import Scenario
    s = Scenario()
    s.set_scenario_variables(breach_list=breach_list)

    You can then access the ${breach_list} variable in your reporter, or from a "send message" step.

    F3E82D13-DC99-4B5A-8233-3093107DBC5D_1_201_a.jpeg

    Reference documentation:

  • Marty
    Marty Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 11 Partner
    edited July 2024

    If you'll indulge me with just a little more assistance.


    I got it to work, but had to alter your solution a little. Here is the code I used:

    scenario = Scenario()
    variable_param = {"breach_list": str(breach_list)}
    scenario.set_scenario_variables(**variable_param)

    This code worked for me; the code you gave threw an error. Probably because I'm doing this through an MS Teams message card.

    Now I have a list of values that were originally created in a Python, called "breach_list". This list contains anywhere from 1 to 24 strings representing variables that have breached a threshold. I can now successfully access these variables in my final "send message" step. That step sends a message to an MS Teams page if a breach occurs (determined by if a check on the data fails). I want to be able to display the elements of that breach list, but right now all I can do is display the entire list. Here is what it looks like right now:

    Dataiku Image.png

    The JSON code in the MS Teams message is as follows:

    @{
        "@type": "MessageCard",
        "@context": "https://schema.org/extensions",
        "themeColor": "${if(outcome == 'SUCCESS', '29AF5D', '')}${if(outcome == 'FAILED', 'F44336', '')}${if(outcome == '', '28A9DD', '')}",
        "summary": "${scenarioName} run report",
        "sections": [
            {
                "text": "${if(outcome == 'SUCCESS', '✅', '')}${if(outcome == 'FAILED', '🔴', '')}${if(outcome == '', '🔔', '')} ${scenarioName}: **${outcome}**",
                "facts": [
                    { "name": "Project:", "value": "${scenarioProjectKey}" },
                    { "name": "Triggered by:", "value": "${triggerName}" },
                    { "name": "Thresholds breached:", "value": "${breach_list}"}
                ]
            }
        ],
        "potentialAction": [
            {
                "@type": "OpenUri",
                "name": "View Report",
                "targets": [
                    { "os": "default", "uri": "https://dss-09e7274b-8eaa4b52-dku.us-east-1.app.dataiku.io/projects/LOOKERREPORTS/webapps/Jv8n57P_credit-score-monitoring-in-bokeh/view" }
                ]
            }
        ]
    }

    In the above code block, all I do is sent it ${breach_list}, but I'd really like to iterate through the elements of that list and print them each out on a separate line. Something like this:

    Thresholds breached:

    Increase in Scorability Mismatch

    EXP Percent Problem Cases 7 Days or Less

    EXP Percent Problem Cases 20 to 40 Days

    Can you suggest how to do this in JSON and in an MS Teams messages card? Is there a construct to replace this line:

    { "name": "Thresholds breached:", "value": "${breach_list}"}

    with something that iterates through the elements of breach_list and then lists them?

    Thanks for all the help. I really appreciate it!

  • Zach
    Zach Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 153 Dataiker

    Hi @martyg
    ,

    It looks like you received a response to your question in another post, so I'm just adding a link here for future reference: Using JSON to travers a list of values in a Message Card in an MS Teams Reporter of a Scenario.

    If you still needed assistance, please let me know.

  • Noah
    Noah Registered Posts: 44 ✭✭✭✭
  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,590 Neuron
Setup Info
    Tags
      Help me…