How to Automatically Create an Up-to-Date Dataset from Data Quality Rules ?

MathisC · April 11

Hi everyone,

I'm working on a project where I've applied several Data Quality rules to a dataset (MAST_prepared). Using the Data Quality tab, I clicked "Create dataset from rules data", which generated a new dataset (MAST_prepared_rules).

👉 Issue: this dataset is static. It does not update automatically when I rebuild the flow, and it's connected to the original dataset by a dotted line in the Flow, which I understand means it's not part of the executable pipeline.

What I'm trying to achieve is a way to dynamically generate the results of data quality rules every time the flow is built — ideally using a scenario or recipe — so I can further process or visualize the rule violations.

❓ My question:

Is there a recommended or native way to:

Create a dataset from data quality rules
That updates dynamically in the Flow when upstream data changes
Without having to manually rewrite the rules in Python ?

Any advice or experience would be greatly appreciated — thanks a lot in advance! 🙏

Turribeach · April 11

You can get this data from the Python API and then build a dataset yourself as part of your flow. Alternatively you can migrate the DSS runtime database to a PostgreSQL instance you should provide:

https://doc.dataiku.com/dss/latest/operations/runtime-databases.html#externally-hosting-runtime-databases

Then you can access this data in the DATA_QUALITY_* tables (ie DATA_QUALITY_OBJECT_HISTORY).

How to Automatically Create an Up-to-Date Dataset from Data Quality Rules ?

Setup Info

❓ My question:

Best Answer

Welcome!

Welcome!

Quick Links

Categories

Sign up to take part

How to Automatically Create an Up-to-Date Dataset from Data Quality Rules ?

Setup Info

❓ My question:

Best Answer

Welcome!

Welcome!

Quick Links

Categories