Auto Calculate Metrics from Plugin Recipe

gblack686
Level 4
Auto Calculate Metrics from Plugin Recipe

I have set up some metrics in my output dataset, but they don't auto-update when the plugin is run.  Is this because it is a plugin? Is there any way to run the checks, including SQL probes, at the end of my plugin when the dataset is built?

Thanks in advance for all the help. This forum is a lifesaver. 

5 Replies
Alex_Combessie
Dataiker Alumni

Hi,

To auto-compute metrics and checks, you need to activate it on your output dataset. You can find this setting in the Dataset > Status > Edit screen.

This is a one-time manual operation, and then the metrics and checks will always be updated when your recipe runs. Note that this is not specific to plugin recipes, but applies to all recipes.

To automate the creation and auto-computation of metrics and checks (which is a great idea, by the way) you can use  our public API: https://doc.dataiku.com/dss/latest/python-api/rest-api-client/datasets.html#dataikuapi.dss.dataset.D.... That will imply getting / setting elements in the dataset definition. Note that if you use SQL probes, you will need to add output control type to avoid applying it to a non-SQL dataset. You can use the mustBeStrictlyType key for that in your recipe.json.

Hope it helps,

Alex

gblack686
Level 4
Author

The metrics don't seem to be calculating after the plugin is run. There is a warning and error by 'last run results'.  I've added mustBeStrictlyType  to both the input and output roles in recipe.json. 

 

Run on SQL_Metrics
 java.lang.Exception : Query failed: ERROR: Integer data overflow (multiplication)
  Detail: 
  -----------------------------------------------
  error:  Integer data overflow (multiplication)
  code:      1058
  context:   
  query:     31377395
  location:  numeric_bound.cpp:121
  process:   query7_687_31377395 [pid=69411]
  -----------------------------------------------
 Stacktrace
java.lang.Exception: Query failed: ERROR: Integer data overflow (multiplication)
  Detail: 
  -----------------------------------------------
  error:  Integer data overflow (multiplication)
  code:      1058
  context:   
  query:     31377395
  location:  numeric_bound.cpp:121
  process:   query7_687_31377395 [pid=69411]
  -----------------------------------------------

	at com.dataiku.dip.metrics.engines.JdbcEngine.compute(JdbcEngine.java:152)
	at com.dataiku.dip.metrics.MetricsComputationService.computeMetrics(MetricsComputationService.java:301)
	at com.dataiku.dip.metrics.MetricsLaunchService$ComputeMetricsThread.execute(MetricsLaunchService.java:718)
	at com.dataiku.dip.futures.FutureThreadBase.run(FutureThreadBase.java:88)

 

 

Alex_Combessie
Dataiker Alumni

Hi,

Could you share the code of the SQL probe here ? I'll try to reproduce.

Cheers,

Alex

gblack686
Level 4
Author

I have 171 metrics and only 70 seem to auto-calculate - these include adv_col_stats:MODE, adv_col_stats:TOP10, basic:COUNT_COLUMNS, sql_query:distinct_product_id:SQL Probe.  Only one of my SQL Probes auto-calculated.  None of my col_stats calculated. All of my metrics are set to Auto Calculate after Build.  And I have tried both Compute metrics when there is not additional incurred cost On/Off.  Below are my SQL Probes. 

select 
count(distinct claim_type) as distinct_claim_type,
count(distinct claim_status) as distinct_claim_status,
count(distinct channel) as distinct_channel,
count(distinct state_code) as distinct_state_code,
count(distinct payer_name) as distinct_payer_name,
count(distinct plan_name) as distinct_plan_name,
count(distinct cbsa_code) as distinct_cbsa_code
FROM ${DKU_DATASET_TABLE_NAME}

Probe 2,3,4,5

SELECT count(stable_24_yn) as stable_24_yn_YES FROM ${DKU_DATASET_TABLE_NAME}
where stable_24_yn = 'Y';

I can't seem to find any pattern in the metrics that calculate vs those that don't.  I've noticed that the uncalculated metrics aren't updated with the newest datetime at Status > Metrics, and they don't show up in my Metrics dataset with the newest datetime timestamp. 

Alex_Combessie
Dataiker Alumni

Hi, 

Could you please open a support ticket, attaching a diagnosis of the affected job (from the Job page, Actions > Download job diagnosis)?

Cheers,

Alex