How do I send an email to the user based on a condition on count of records in a dataset?

Prince
Prince Registered Posts: 2 ✭✭

hi,

After reading the documentation, I came cross the "Compute metrics" step in Scenarios, but how do i retrieve the count of the dataset using ${stepOutput_the_metrics}? And then if the count is more than 0, I want to trigger an email to the user.

I am on DSS version 13

Any help is appreciated.

Thanks

Operating system used: Windows

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,252 Neuron

    Hi, have a look at this post which covers in detail how to extract values from a metric into a scenario variable:

  • Prince
    Prince Registered Posts: 2 ✭✭

    @Turribeach thanks for the quick response. I looked at the solution but that needs project id and dataset id to be mentioned in the formula. But can this be done using a python custom code? I was doing something like this. The code gives me the metric count, but now how to I use "query_fail_count" variable in the Reporter section to check the condition and send email. Eg; outcome == 'SUCCESS' && query_fail_count > 0 THEN send an email

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,252 Neuron
    edited February 3

    Personally I think it’s more elegant without using Python. You can however use Python too. You will need to define a scenario variable to be able to use the value in other scenario steps. This code should do that:

    import dataiku
    from dataiku.scenario import Scenario
    
    mydataset = dataiku.Dataset("dataset")
    Scenario().set_scenario_variables(query_fail_count = mydataset.get_last_metric_values().get_global_value('records:COUNT_RECORDS'))
    

    However I don't like this solution because of two reasons:

    1. dataset.get_last_metric_values() assumes metrics have been run successfully at least once for the dataset. It will fail if no metrics / record count have ever been run or have not completed successfully
    2. dataset.get_last_metric_values() may give you outdated data as it will not guarantee it's the current record count of the dataset. If your last metric count failed to be executed this call will give you the previous value without any warning!

    Fixing the above issues in Python code is possible but it will need several lines of more code to execute the metric, wait for it to complete, check the result, etc. Hence why the solution I proposed in my other post is more elegant, much simpler and more robust.

    Add compute metrics step in your scenario:

    This guarantees the record count has been calculated successfully as part of the scenario run. If the compute metrics step fails, the scenario fails. Then fetch the metric value and define the variable:

    Finally use it on any subsecuent scenario steps for conditional execution of the step:

Setup Info
    Tags
      Help me…