How to Automatically Create an Up-to-Date Dataset from Data Quality Rules ?

Hi everyone,
I'm working on a project where I've applied several Data Quality rules to a dataset (MAST_prepared
). Using the Data Quality tab, I clicked "Create dataset from rules data", which generated a new dataset (MAST_prepared_rules
).
👉 Issue: this dataset is static. It does not update automatically when I rebuild the flow, and it's connected to the original dataset by a dotted line in the Flow, which I understand means it's not part of the executable pipeline.
What I'm trying to achieve is a way to dynamically generate the results of data quality rules every time the flow is built — ideally using a scenario or recipe — so I can further process or visualize the rule violations.
❓ My question:
Is there a recommended or native way to:
- Create a dataset from data quality rules
- That updates dynamically in the Flow when upstream data changes
- Without having to manually rewrite the rules in Python ?
Any advice or experience would be greatly appreciated — thanks a lot in advance! 🙏
Best Answer
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,409 Neuron
You can get this data from the Python API and then build a dataset yourself as part of your flow. Alternatively you can migrate the DSS runtime database to a PostgreSQL instance you should provide:
Then you can access this data in the DATA_QUALITY_* tables (ie DATA_QUALITY_OBJECT_HISTORY).