Data quality : Monitoring on datasets processing
Hi,
I'm asking about how DSS monitors issues during datasets processing. I see two kinds of potential issues:
- Volume : Inconsistant number of records in a dataset (eg : I expected at least 1k records per day for my "webtraffic" dataset)
- Schema / values: One or more rows have fields that don't respect the defined schema or expected values (eg : in webtraffic dataset, IP adresses are not valid or values of a date field are not expected).
Is there a way to monitor / handle those errors in DSS and be notified by email or something ?
Thanks,
Romain.
Best Answer
-
Hi Romain,
These features are on our roadmap. You can get in touch with our Sales team if you'd like more details.
As of today, you could write a custom recipe in Python for instance and write your tests.
Answers
-
Good new!
Thanks for the quick reply -
Hello Jeremy,
Have those monitoring features been developed since?
Or is it still on your roadmap?
Sébastien
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Hi @sbourgeois-k
,They have been developed! You should check the metrics and checks documentation:
- Metrics: https://doc.dataiku.com/dss/latest/scenarios/metrics.html
- Checks: https://doc.dataiku.com/dss/latest/scenarios/checks.html
Also, there is a hands-on activity at https://knowledge.dataiku.com/latest/courses/automation/metrics-checks-hands-on.html
Cheers!
-
-
Another tip that might be useful with the metrics and checks is that you can automate them using scenarios in your project. Here is a good resource from the academy that walks through this: https://academy.dataiku.com/automation-course-1?next=%2Fautomation-course-1%2F668968