Community Conundrum 28: News Engagement is live! Read More

Recommend Way for Comparing Metrics between Multiple Flow Datasets in a Check

Recommend Way for Comparing Metrics between Multiple Flow Datasets in a Check


I was wondering if there is any recommended way to create a check for a dataset that can reference metrics calculated from another dataset in the flow (most likely an ancestor dataset). 

For instance being able to see if the parent dataset and child dataset had a difference in min and max values for a particular column. 

Thank you for your help!


0 Kudos
2 Replies


To achieve this I believe you will need to create a custom check using python.  You can retrieve metric values for various datasets in your project using the dataikuapi. Below is a starter code (for example to use in a notebook or recipe) that shows how to use the API to compare metrics values between a source and target datasets:

import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
import dataikuapi

client = dataiku.api_client()
proj = client.get_project(dataiku.default_project_key())
source_dataset = 'tweets_scored_aggregate'
target_dataset = 'tweets_scored_aggregate_prepared'
##Metrics ID
record_count = 'records:COUNT_RECORDS'
unique_values = 'col_stats:COUNT_DISTINCT:text'

##Retrieve Metrics Last Value and Compute Difference
source_metrics = proj.get_dataset(source_dataset).get_last_metric_values()
source_record_count = source_metrics.get_metric_by_id(record_count)['lastValues'][0]['value']

target_metrics = proj.get_dataset(target_dataset).get_last_metric_values()
target_record_count = target_metrics.get_metric_by_id(record_count)['lastValues'][0]['value']

gap = float(source_record_count) - float(target_record_count)

Then in your custom check you can set the business rule for the Check outcome based on the 'gap' value

Hope this helps


Thank you for your help!

0 Kudos
Labels (2)
A banner prompting to get Dataiku DSS