How to extract rows flagged by a custom Python rule in the Data Quality tab ?

Options
MathisC
MathisC Registered Posts: 2 ✭✭

Hi everyone,

I'm working with Dataiku DSS version 13.5, and I'm using the Data Quality tab on datasets to define validation rules.

When I use standard rules (e.g., missing values, uniqueness, etc.), I can easily export the rows in error. However, when I define a custom Python rule, I can see the column status marked as "Fail", but I cannot extract the rows that triggered the failure.

Here’s what I’m trying to achieve:

  • Define one or more validation rules using Python in the Data Quality tab
  • And then extract or flag the rows that failed those Python-based rules, just like with standard rules

Is there any way to:

  1. Display or mark the failing rows detected by a Python rule?
  2. Automatically add a flag column based on a Python rule?

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,531 Neuron
    edited July 8

    Instead of using a Python rule move your rule code to a Python recipe and create a custom column for your rule. Then simply create a data quality rule that matches the column = 'Fail'.

Setup Info
    Tags
      Help me…