Survey banner
The Dataiku Community is moving to a new home! Some short term disruption starting next week: LEARN MORE

Set up warning for failure in the scenario

Solved!
Ruta
Level 2
Set up warning for failure in the scenario

Hello,

I have a requirement to Monitor percentage of 0s in a dataset and put 3 weeks data in weekly email and to generate an alert in email notification when there are > 90% of 0s.

How this can be done in dataiku?

Thanks in advance

0 Kudos
1 Solution
qweenfo
Level 2

To keep track of zero percentages in your dataset using Dataiku and send out weekly emails containing three weeks of data, as well as trigger an alert for zero percentages exceeding 90%, follow these steps:

  1. Set up a recipe or scenario in Dataiku to compute the zero percentage in your dataset.
  2. Create a Dataiku flow to schedule this calculation to occur weekly.
  3. Configure an email plugin within Dataiku to dispatch the weekly email including the calculated data.
  4. Establish an alert within the email notification settings to activate when the zero percentage surpasses 90%.

By following these steps, you'll ensure consistent monitoring of zero percentages and receive alerts whenever thresholds are breached, all conveniently managed within Dataiku's workflow.

View solution in original post

0 Kudos
9 Replies
Turribeach

To send data as part of your scenario look at scenario reports:

https://knowledge.dataiku.com/latest/mlops-o16n/automation/tutorial-reporters.html

To generate an email notification for the percentage of zeros in your dataset you need to set a metric in your dataset. Then retrieve the value of the metric in the scenario and use conditional step execution logic on a reporter scenario step to send it. Here is a post covering conditional step execution in detail.

Ruta
Level 2
Author

Hey Turribeach Thanks so much for the answer.

I still have some confusion here, I have created scenario with steps to calculate percentage of 0 in a column and stored it in a new data set. Now I m trying to send email with last 3 weeks data from new data set with some filtering on new data set, how can I do that ? 

I tried adding send message step in scenario, but it adds complete data set as an attachment without any filtering on it.

I also tried using custom python code to to do filtering on output data set and store the result in new Data frame variable . But how can I use this variable from python script into email body or send message step?

0 Kudos


@Ruta wrote:

I still have some confusion here, I have created scenario with steps to calculate percentage of 0 in a column and stored it in a new data set.


In order to be able to take conditional steps in a scenario you need to have the calculate percentage of 0s in a dataset metric so you can retrieve the value of the metric and store it in a scenario variable. This is irrespective of your requirement of actually sending the dataset with calculate percentage of 0s as a mail attachment. In other words you need to do both things, the metric and the calculate percentage of 0s dataset. Have another look at my post and the post from @Grixis as it explains how to do this. If you are still having issues describe all the steps you followed and where do you see an error. 


@Ruta wrote:

Now I m trying to send email with last 3 weeks data from new data set with some filtering on new data set, how can I do that ? I tried adding send message step in scenario, but it adds complete data set as an attachment without any filtering on it.

Mail dataset attachments can't have any filters applied to them. But this is a trivial problem to bypass. Just add a new Filter recipe using your original calculate percentage of 0s dataset as an input and then set whatever filters you want. Then use the new output dataset in your mail attachment.

 

 

 

 

Ruta
Level 2
Author

Is it possible to attach more than 1 result dataset in send message step (in scenario). If yes, how to do it ?

0 Kudos

Why do you think you can only attach only 1 result dataset? Show a screen shot of how you doing it.

0 Kudos
Ruta
Level 2
Author

Hi,

I am able to attach multiple dataset in single send message step.

Currently I am facing one more issue, when I run python recipe manually to calculate percentage of zero in input dataset, the output dataset file gets refreshed each time when the python recipe is executed and historical data is also deleted. Could you please help to this problem?

following code not working

# Write recipe outputs
results_w_brands_no_filter_S3_PROD_predcited_zeros = dataiku.Dataset("results_w_brands_no_filter_S3_PROD_predcited_zeros",ignore_flow=True)
results_w_brands_no_filter_S3_PROD_predcited_zeros.write_with_schema(new_result)

 

Thanks

0 Kudos
qweenfo
Level 2

To keep track of zero percentages in your dataset using Dataiku and send out weekly emails containing three weeks of data, as well as trigger an alert for zero percentages exceeding 90%, follow these steps:

  1. Set up a recipe or scenario in Dataiku to compute the zero percentage in your dataset.
  2. Create a Dataiku flow to schedule this calculation to occur weekly.
  3. Configure an email plugin within Dataiku to dispatch the weekly email including the calculated data.
  4. Establish an alert within the email notification settings to activate when the zero percentage surpasses 90%.

By following these steps, you'll ensure consistent monitoring of zero percentages and receive alerts whenever thresholds are breached, all conveniently managed within Dataiku's workflow.

0 Kudos
Ruta
Level 2
Author

Hey Thanks so much for the answer.

On output dataset (csv file) how can I apply filter to fetch 3 weeks records only and send mail every weekend.

I need to also highlight records with warning if the error value is > 90.

How can I do this in dataiku? Do I need to write python recipe for it or their is some other way to do it . Can you please explain in detail . I m new to dataiku.

 

Thanks 

Grixis
Level 4

Hello @Ruta 

Indeed, there is no explicit example in the documentation for this but I think the two attachments meet your need.

By first doing a computation metrics step of a dataset you will keep the result as 'stepOutput'.

Consequently during your scenario you can add a step following to iterate on stepOutput_your_name_of_previous_update_metrics_steps to set project variables by the using a visual step. 

In the attached example I update the metrics of a dataset of my project by naming the step the_metrics so just behind I set variables by taking my objectstepOutput_the_metrics to which I set all the information as complete_json_example.

Then 3 other example sets to show you how to use the dataiku formula language to impose a filter on precise values.(filter(parseJson(stepOutput_the_metrics)["database.your_dataset_name"]['computed'], x, x["metricId"]=="col_stats:MEAN:build_time_avg")[0].value) col_stats:MEAN:build_time_avg as the metrics ID you want to capture.

0 Kudos