Metrics and Checks

anaanike
Level 1
Metrics and Checks

I am archiving the runs of a recipe as a csv dataset in a folder on Dataiku and syncing the latest run to a separate dataset. Can I establish a check which compares the latest run with the previous run rather than predefined numbers? 

0 Kudos
1 Reply
Turribeach

You don't really clarify what exactly you want to compare in your check but this should be possible. As a sample I have a dataset with 3 columns where the third column is called "Some_Number". I go and add a Metric to calculate the Max of the "Some_Number" column under Column statistics. Then I go to Checks and create a custom Python Check:

 

def process (last_values, dataset):

    df = dataset.get_dataframe()
    current_val = df['Some_Number'].max()

    last_val = last_values['col_stats:MAX:Some_Number'].get_value()

    if last_val != current_val:
        return 'WARN'
    else:
        return 'OK'

 

As you can see from the code I first calculate the max of column and I then retrieve the last metric value and compare it. The key for this to work is that you need make sure the check runs BEFORE the metric is recalculated for the current run as otherwise you will loose the previous metric value once it's recalculated. Another option can be to do it via scenario variables where you could calculate metrics before you build the dataset, store the value of the metric in a scenario variable, build dataset and then calculate the new metrics and store the new metric value in another scenario variable for comparasion in another scenario step. But I am not really sure what you are trying to achieve so the best approach will depend on what your actual requirement is: what do you need to compare and why.

0 Kudos