Metrics and Checks
Hello-
I have a dataset with a product_type column that I create with a prepare recipe. I need to make sure not too many rows fall into the other category (which is the default). I'd like to have a check that says something like
sum(if ( strval('PRODUCT_TYPE') == 'other', BALANCE,0))/sum(BALANCE)*100 > .25 then fail.
I tried to set it up to use the Cell Value Probe but it doesn't seem to work. Is there another way to do this without using python?
also, is there a way to compare a metric from one dataset to another? In my example, we don't want to see the balance change as we work our way through the flow.
thanks!
Mindy
Operating system used: Linux
Best Answer
-
Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
Hi @mbillingham
,
Using a Python check would be the best approach for the more complex logic that you want to apply to your dataset.
Regarding comparing metrics from one dataset to another, you can use the Python API to get the current metric values for a different dataset. For an example, within a Python metric probe or check you can import another dataset and then call get_last_metric_values() on the dataset to pull the most recent metrics. You can then compare to your current dataset!import dataiku # training is another dataset mydataset = dataiku.Dataset("training") # Define here a function that returns the metric. def process(dataset, partition_id): # dataset is a dataiku.Dataset object other_dataset_metric = mydataset.get_last_metric_values().get_metric_by_id('records:COUNT_RECORDS')['lastValues'][0]['value']
I would suggest testing out the get_metric_by_id() call in a Python notebook until you have the exact syntax that you want to use to pull the relevant metrics.
I hope that is helpful, let us know if you have any questions about this!
Thanks,
Sarina
Answers
-
Thank you!