Auto Calculate Metrics from Plugin Recipe
I have set up some metrics in my output dataset, but they don't auto-update when the plugin is run. Is this because it is a plugin? Is there any way to run the checks, including SQL probes, at the end of my plugin when the dataset is built?
Thanks in advance for all the help. This forum is a lifesaver.
Answers
-
Hi,
To auto-compute metrics and checks, you need to activate it on your output dataset. You can find this setting in the Dataset > Status > Edit screen.
This is a one-time manual operation, and then the metrics and checks will always be updated when your recipe runs. Note that this is not specific to plugin recipes, but applies to all recipes.
To automate the creation and auto-computation of metrics and checks (which is a great idea, by the way) you can use our public API: https://doc.dataiku.com/dss/latest/python-api/rest-api-client/datasets.html#dataikuapi.dss.dataset.DSSDataset. That will imply getting / setting elements in the dataset definition. Note that if you use SQL probes, you will need to add output control type to avoid applying it to a non-SQL dataset. You can use the mustBeStrictlyType key for that in your recipe.json.
Hope it helps,
Alex
-
The metrics don't seem to be calculating after the plugin is run. There is a warning and error by 'last run results'. I've added mustBeStrictlyType to both the input and output roles in recipe.json.
Run on SQL_Metrics java.lang.Exception : Query failed: ERROR: Integer data overflow (multiplication) Detail: ----------------------------------------------- error: Integer data overflow (multiplication) code: 1058 context: query: 31377395 location: numeric_bound.cpp:121 process: query7_687_31377395 [pid=69411] ----------------------------------------------- Stacktrace java.lang.Exception: Query failed: ERROR: Integer data overflow (multiplication) Detail: ----------------------------------------------- error: Integer data overflow (multiplication) code: 1058 context: query: 31377395 location: numeric_bound.cpp:121 process: query7_687_31377395 [pid=69411] ----------------------------------------------- at com.dataiku.dip.metrics.engines.JdbcEngine.compute(JdbcEngine.java:152) at com.dataiku.dip.metrics.MetricsComputationService.computeMetrics(MetricsComputationService.java:301) at com.dataiku.dip.metrics.MetricsLaunchService$ComputeMetricsThread.execute(MetricsLaunchService.java:718) at com.dataiku.dip.futures.FutureThreadBase.run(FutureThreadBase.java:88)
-
Hi,
Could you share the code of the SQL probe here ? I'll try to reproduce.
Cheers,
Alex
-
I have 171 metrics and only 70 seem to auto-calculate - these include adv_col_stats:MODE, adv_col_stats:TOP10, basic:COUNT_COLUMNS, sql_query:distinct_product_id:SQL Probe. Only one of my SQL Probes auto-calculated. None of my col_stats calculated. All of my metrics are set to Auto Calculate after Build. And I have tried both Compute metrics when there is not additional incurred cost On/Off. Below are my SQL Probes.
select count(distinct claim_type) as distinct_claim_type, count(distinct claim_status) as distinct_claim_status, count(distinct channel) as distinct_channel, count(distinct state_code) as distinct_state_code, count(distinct payer_name) as distinct_payer_name, count(distinct plan_name) as distinct_plan_name, count(distinct cbsa_code) as distinct_cbsa_code FROM ${DKU_DATASET_TABLE_NAME}
Probe 2,3,4,5
SELECT count(stable_24_yn) as stable_24_yn_YES FROM ${DKU_DATASET_TABLE_NAME} where stable_24_yn = 'Y';
I can't seem to find any pattern in the metrics that calculate vs those that don't. I've noticed that the uncalculated metrics aren't updated with the newest datetime at Status > Metrics, and they don't show up in my Metrics dataset with the newest datetime timestamp.
-
Hi,
Could you please open a support ticket, attaching a diagnosis of the affected job (from the Job page, Actions > Download job diagnosis)?
Cheers,
Alex