Email dataset based on rowcount from a group of datasets in one scenario
Hi - I know how to set up a scenario to check if a dataset has at least one row and email it within the reporter configuration of the scenario. Is there a way to do this for multiple datasets that are built within the flow?
For instance we have quality_data1, quality_data2, quality_data3, etc.
We only want the data set emailed if it contains data. That is during one build maybe they will all have data but during another build only 1 or none will.
I know we could set up a scenario to build each dataset separately and then report from there but is there a way to do this in one scenario so only email if the dataset has rows?
Thanks in advance for your time and help!!
Best Answer
-
Hi Anne,
As of today it is not possible to dynamically attach multiple Datasets in a single email Reporter. However, you can switch to a fully custom Scenario and leverage the Scenario API to define how the Reporter is built (e.g. by populating it only with Datasets which record count is > 0).
Given the interesting-yet-not-trivial example of your use-case, I may later add a simplified example in our code sample repository, if so I will post the link here.
Hope this helps!
Best,
Harizo
Answers
-
Thank you for the response! If I understand this correctly then I would add a new scenario that is a python script. I would set up my trigger as normal but then within the script I would build each dataset, check the count and send the dataset within the python script. I would not use the reporter to do the sending. This looks like it would work - thank you!
-
Hello again Anne,
I took some inspiration from your use-case to add a new code sample to our repository, feel free to reuse and modify it as you see fit for your specific use-case !
Best,
Harizo
-
Awesome - we will give it a try!
-
Hi Harizo - thanks for the code! I have been trying to set this up as described in the git repository to one of my flows, but when I run the reporter setup scenario with this Python code I run into the following error at line 59 of the code:
“AttributeError: 'ComputedMetrics' object has no attribute 'get_raw'”
Do you know why this error might be coming up? Could it be due to our version of Dataiku or something else we may need to modify in the code?
Thanks for the help!
-
Hi aazariaz,
The ComputedMetrics.get_raw() method was indeed introduced recently, following the release of Dataiku 10. The fix for previous versions is easy though, you can make it work by replacing metrics.get_raw().get("metrics") by metrics.raw.get("metrics").
Best,
Harizo