Create a check to compare record count between two datasets
Hetesh
Registered Posts: 13 ✭✭✭✭
Hi, I am currently creating a dashboard which will highlight to me basic checks for row count and column counts. I would also like to put in a check to compare two different metrics e.g. row count from one dataset to another.
Reason for this check is to ensure cardinality of one to one post all the transformations and joins.
I was not sure how to reference another metric or another dataset within the checks page.
Best Answer
-
Hi,
you can use the `dataiku` import in a Python check for that, and access the other dataset's metrics for comparison. For example:
import dataiku def process(last_values, dataset, partition_id): this_dataset_record_count_metric = last_values.get('records:COUNT_RECORDS') this_dataset_record_count = int(this_dataset_record_count_metric.get_value()) if this_dataset_record_count_metric is not None else 0 other_dataset = dataiku.Dataset("train_set") other_dataset_record_count = other_dataset.get_last_metric_values().get_global_value('records:COUNT_RECORDS') if this_dataset_record_count != other_dataset_record_count: return 'ERROR', 'record counts: %s <-> %s' % (this_dataset_record_count, other_dataset_record_count) else: return 'OK', 'record counts: %s' % this_dataset_record_count
Answers
-
perfect, that works great! thanks @fchataigner2