How do I calculate percentage in metrics ?
Hi,
I want to calculate a percentage of rows with missing values in my dataset.
To do this, I apply a filter recipe on my dataset with a "or" condition on all columns to check whether values is "is not defined".
With the number of rows of this filtered dataset , I can create a metric with the number of rows with at least one null value.
And I would like to divide this metric with the metric "count of records" of the original dataset to create a new metric with the percentage of rows with missing values.
I did not manage to create a metric which is a result of a calculation using values of other metrics.
Is It possible to do such thing ?
I tried to do it in the most simple way (using recipe and metrics on the new dataset) as people who will put in place this kind of feature are not necessarily used to write formulas in DSS language or SQL requests. But perhaps it is the only way to do this.
Can you give me some advices on creating such percentage metrics ?
Thanks
Answers
-
Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
Hi @matisbenoit
,
If you want to calculate a percentage of two metrics using other metrics that currently exist in a dataset, you would need to create a custom Python metric.
You can do this using the Python API. First, in order to get the name of your numerator metric, you'll want to run the following in a Python notebook:import dataiku dataset = dataiku.Dataset('YOUR_DATASET') dataset.get_last_metric_values().get_all_ids()
This will tell you how to reference your metric IDs. Then, you can create a Python probe from the dataset with something like this:def process(dataset, partition_id): # on our dataset oject, get the last metric values computed. numerator = dataset.get_last_metric_values().get_metric_by_id('<REPLACE_WITH_NUMERATOR_METRIC_ID>')['lastValues'][0]['value'] # get the last metric value for the count of records in the dataset denominator = dataset.get_last_metric_values().get_metric_by_id('records:COUNT_RECORDS')['lastValues'][0]['value'] # divide your first metric by the total count of records and multiplate by 100 to get a percent. Rount to two decimal places percent = round(100*(int(numerator) / int(denominator)), 2) # return a string version of the percent so that it displays nicely in the UI return {'metric_name1' : str(percent) + '%'}
This is what it looks like on the metrics screen:
Let me know if you have any questions about this!
Thanks,
Sarina -
Thanks you for your clear answer
Matis