Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I would like to run a check that fails if a column "col1" in my dataset has duplicate values. In the metrics tab I am running the "Distinct value count" on col1 and "Records Counts" on the table. How do I write a custom Python check to determine if the "Distinct value count" on col1 equals "Records Counts" to determine if col1 is unique?
Hi,
Here is an example of such a Python check:
# Define here a function that returns the outcome of the check.
def process(last_values, dataset, partition_id):
# last_values is a dict of the last values of the metrics,
# with the values as a dataiku.metrics.MetricDataPoint.
# dataset is a dataiku.Dataset object
#count_record = last_values["records:COUNT_RECORDS"]["raw"]["value"]
#count_distinct =
if last_values["records:COUNT_RECORDS"].get_value()== last_values["col_stats:COUNT_DISTINCT:<PUT_YOUR_COLUMN_NAME_HERE>"].get_value():
return('OK', "no duplicate")
else:
return("ERROR", "duplicates")
[EDIT] I had forgotten to call the get_value() method on last_values["..."]
Hi,
Here is an example of such a Python check:
# Define here a function that returns the outcome of the check.
def process(last_values, dataset, partition_id):
# last_values is a dict of the last values of the metrics,
# with the values as a dataiku.metrics.MetricDataPoint.
# dataset is a dataiku.Dataset object
#count_record = last_values["records:COUNT_RECORDS"]["raw"]["value"]
#count_distinct =
if last_values["records:COUNT_RECORDS"].get_value()== last_values["col_stats:COUNT_DISTINCT:<PUT_YOUR_COLUMN_NAME_HERE>"].get_value():
return('OK', "no duplicate")
else:
return("ERROR", "duplicates")
[EDIT] I had forgotten to call the get_value() method on last_values["..."]