How do I calculate percentage in metrics ?

matisbenoit
matisbenoit Registered Posts: 2

Hi,

I want to calculate a percentage of rows with missing values in my dataset.

To do this, I apply a filter recipe on my dataset with a "or" condition on all columns to check whether values is "is not defined".

With the number of rows of this filtered dataset , I can create a metric with the number of rows with at least one null value.

And I would like to divide this metric with the metric "count of records" of the original dataset to create a new metric with the percentage of rows with missing values.

I did not manage to create a metric which is a result of a calculation using values of other metrics.

Is It possible to do such thing ?

I tried to do it in the most simple way (using recipe and metrics on the new dataset) as people who will put in place this kind of feature are not necessarily used to write formulas in DSS language or SQL requests. But perhaps it is the only way to do this.

Can you give me some advices on creating such percentage metrics ?

Thanks

Tagged:

Answers

  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
    edited July 17

    Hi @matisbenoit
    ,

    If you want to calculate a percentage of two metrics using other metrics that currently exist in a dataset, you would need to create a custom Python metric.

    You can do this using the Python API. First, in order to get the name of your numerator metric, you'll want to run the following in a Python notebook:

    import dataiku 
    
    dataset = dataiku.Dataset('YOUR_DATASET')
    dataset.get_last_metric_values().get_all_ids()
    


    This will tell you how to reference your metric IDs. Then, you can create a Python probe from the dataset with something like this:

    def process(dataset, partition_id):
        # on our dataset oject, get the last metric values computed. 
        numerator = dataset.get_last_metric_values().get_metric_by_id('<REPLACE_WITH_NUMERATOR_METRIC_ID>')['lastValues'][0]['value']
        # get the last metric value for the count of records in the dataset 
        denominator = dataset.get_last_metric_values().get_metric_by_id('records:COUNT_RECORDS')['lastValues'][0]['value']
        # divide your first metric by the total count of records and multiplate by 100 to get a percent. Rount to two decimal places 
        percent =  round(100*(int(numerator) / int(denominator)), 2)
        # return a string version of the percent so that it displays nicely in the UI
        return {'metric_name1' : str(percent) + '%'}
    

    Screenshot 2023-05-01 at 5.24.54 PM.png

    This is what it looks like on the metrics screen:

    Screenshot 2023-05-01 at 5.29.43 PM.png

    Let me know if you have any questions about this!

    Thanks,
    Sarina 

  • matisbenoit
    matisbenoit Registered Posts: 2

    Thanks you for your clear answer

    Matis

Setup Info
    Tags
      Help me…