Submit your innovative use case or inspiring success story to the 2023 Dataiku Frontrunner Awards! LET'S GO

How do I calculate percentage in metrics ?

matisbenoit
Level 1
How do I calculate percentage in metrics ?

Hi,

I want to calculate a percentage of rows with missing values in my dataset.

To do this, I apply a filter recipe on my dataset with a "or" condition on all columns to check whether values is "is not defined".

With the number of rows of this filtered dataset , I can create a metric with the number of rows with at least one null value.

And I would like to divide this metric with the metric "count of records" of the original dataset to create a new metric with the percentage of rows with missing values.

I did not manage to create a metric which is a result of a calculation using values of other metrics.

Is It possible to do such thing ?

I tried to do it in the most simple way (using recipe and metrics on the new dataset) as people who will put in place this kind of feature are not necessarily used to write formulas in DSS language or SQL requests. But perhaps it is the only way to do this.

Can you give me some advices on creating such percentage metrics ?

Thanks

0 Kudos
2 Replies
SarinaS
Dataiker

Hi @matisbenoit,

If you want to calculate a percentage of two metrics using other metrics that currently exist in a dataset, you would need to create a custom Python metric.  

You can do this using the Python API. First, in order to get the name of your numerator metric, you'll want to run the following in a Python notebook:

import dataiku 

dataset = dataiku.Dataset('YOUR_DATASET')
dataset.get_last_metric_values().get_all_ids()


This will tell you how to reference your metric IDs. Then, you can create a Python probe from the dataset with something like this:

def process(dataset, partition_id):
    # on our dataset oject, get the last metric values computed. 
    numerator = dataset.get_last_metric_values().get_metric_by_id('<REPLACE_WITH_NUMERATOR_METRIC_ID>')['lastValues'][0]['value']
    # get the last metric value for the count of records in the dataset 
    denominator = dataset.get_last_metric_values().get_metric_by_id('records:COUNT_RECORDS')['lastValues'][0]['value']
    # divide your first metric by the total count of records and multiplate by 100 to get a percent. Rount to two decimal places 
    percent =  round(100*(int(numerator) / int(denominator)), 2)
    # return a string version of the percent so that it displays nicely in the UI
    return {'metric_name1' : str(percent) + '%'}

Screenshot 2023-05-01 at 5.24.54 PM.png

This is what it looks like on the metrics screen:

Screenshot 2023-05-01 at 5.29.43 PM.png

Let me know if you have any questions about this!

Thanks,
Sarina 

0 Kudos
matisbenoit
Level 1
Author

Thanks you for your clear answer

Matis

0 Kudos