Metrics and Checks

Solved!
mbillingham
Level 2
Metrics and Checks

Hello-

I have a dataset with a product_type column that I create with a prepare recipe. I need to make sure not too many rows fall into the other category (which is the default). I'd like to have a check that says something like 

sum(if ( strval('PRODUCT_TYPE') == 'other', BALANCE,0))/sum(BALANCE)*100 > .25 then fail.

I tried to set it up to use the Cell Value Probe but it doesn't seem to work. Is there another way to do this without using python?

 

also, is there a way to compare a metric from one dataset to another? In my example, we don't want to see the balance change as we work our way through the flow.

thanks!

Mindy


Operating system used: Linux

0 Kudos
1 Solution
SarinaS
Dataiker

Hi @mbillingham,

Using a Python check would be the best approach for the more complex logic that you want to apply to your dataset. 

Regarding comparing metrics from one dataset to another, you can use the Python API to get the current metric values for a different dataset. For an example, within a Python metric probe or check you can import another dataset and then call get_last_metric_values() on the dataset to pull the most recent metrics. You can then compare to your current dataset! 

import dataiku
# training is another dataset 
mydataset = dataiku.Dataset("training")

# Define here a function that returns the metric.
def process(dataset, partition_id):
    # dataset is a dataiku.Dataset object
    other_dataset_metric = mydataset.get_last_metric_values().get_metric_by_id('records:COUNT_RECORDS')['lastValues'][0]['value']

 
I would suggest testing out the get_metric_by_id() call in a Python notebook until you have the exact syntax that you want to use to pull the relevant metrics. 

I hope that is helpful, let us know if you have any questions about this! 

Thanks,
Sarina

View solution in original post

2 Replies
SarinaS
Dataiker

Hi @mbillingham,

Using a Python check would be the best approach for the more complex logic that you want to apply to your dataset. 

Regarding comparing metrics from one dataset to another, you can use the Python API to get the current metric values for a different dataset. For an example, within a Python metric probe or check you can import another dataset and then call get_last_metric_values() on the dataset to pull the most recent metrics. You can then compare to your current dataset! 

import dataiku
# training is another dataset 
mydataset = dataiku.Dataset("training")

# Define here a function that returns the metric.
def process(dataset, partition_id):
    # dataset is a dataiku.Dataset object
    other_dataset_metric = mydataset.get_last_metric_values().get_metric_by_id('records:COUNT_RECORDS')['lastValues'][0]['value']

 
I would suggest testing out the get_metric_by_id() call in a Python notebook until you have the exact syntax that you want to use to pull the relevant metrics. 

I hope that is helpful, let us know if you have any questions about this! 

Thanks,
Sarina

mbillingham
Level 2
Author

Thank you!

0 Kudos