Auto Calculate Metrics from Plugin Recipe

Options
gblack686
gblack686 Partner, Registered Posts: 62 Partner

I have set up some metrics in my output dataset, but they don't auto-update when the plugin is run. Is this because it is a plugin? Is there any way to run the checks, including SQL probes, at the end of my plugin when the dataset is built?

Thanks in advance for all the help. This forum is a lifesaver.

Answers

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Options

    Hi,

    To auto-compute metrics and checks, you need to activate it on your output dataset. You can find this setting in the Dataset > Status > Edit screen.

    This is a one-time manual operation, and then the metrics and checks will always be updated when your recipe runs. Note that this is not specific to plugin recipes, but applies to all recipes.

    To automate the creation and auto-computation of metrics and checks (which is a great idea, by the way) you can use our public API: https://doc.dataiku.com/dss/latest/python-api/rest-api-client/datasets.html#dataikuapi.dss.dataset.DSSDataset. That will imply getting / setting elements in the dataset definition. Note that if you use SQL probes, you will need to add output control type to avoid applying it to a non-SQL dataset. You can use the mustBeStrictlyType key for that in your recipe.json.

    Hope it helps,

    Alex

  • gblack686
    gblack686 Partner, Registered Posts: 62 Partner
    edited July 17
    Options

    The metrics don't seem to be calculating after the plugin is run. There is a warning and error by 'last run results'. I've added mustBeStrictlyType to both the input and output roles in recipe.json.

    Run on SQL_Metrics
     java.lang.Exception : Query failed: ERROR: Integer data overflow (multiplication)
      Detail: 
      -----------------------------------------------
      error:  Integer data overflow (multiplication)
      code:      1058
      context:   
      query:     31377395
      location:  numeric_bound.cpp:121
      process:   query7_687_31377395 [pid=69411]
      -----------------------------------------------
     Stacktrace
    java.lang.Exception: Query failed: ERROR: Integer data overflow (multiplication)
      Detail: 
      -----------------------------------------------
      error:  Integer data overflow (multiplication)
      code:      1058
      context:   
      query:     31377395
      location:  numeric_bound.cpp:121
      process:   query7_687_31377395 [pid=69411]
      -----------------------------------------------
    
        at com.dataiku.dip.metrics.engines.JdbcEngine.compute(JdbcEngine.java:152)
        at com.dataiku.dip.metrics.MetricsComputationService.computeMetrics(MetricsComputationService.java:301)
        at com.dataiku.dip.metrics.MetricsLaunchService$ComputeMetricsThread.execute(MetricsLaunchService.java:718)
        at com.dataiku.dip.futures.FutureThreadBase.run(FutureThreadBase.java:88)

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Options

    Hi,

    Could you share the code of the SQL probe here ? I'll try to reproduce.

    Cheers,

    Alex

  • gblack686
    gblack686 Partner, Registered Posts: 62 Partner
    edited July 17
    Options

    I have 171 metrics and only 70 seem to auto-calculate - these include adv_col_stats:MODE, adv_col_stats:TOP10, basic:COUNT_COLUMNS, sql_query:distinct_product_id:SQL Probe. Only one of my SQL Probes auto-calculated. None of my col_stats calculated. All of my metrics are set to Auto Calculate after Build. And I have tried both Compute metrics when there is not additional incurred cost On/Off. Below are my SQL Probes.

    select 
    count(distinct claim_type) as distinct_claim_type,
    count(distinct claim_status) as distinct_claim_status,
    count(distinct channel) as distinct_channel,
    count(distinct state_code) as distinct_state_code,
    count(distinct payer_name) as distinct_payer_name,
    count(distinct plan_name) as distinct_plan_name,
    count(distinct cbsa_code) as distinct_cbsa_code
    FROM ${DKU_DATASET_TABLE_NAME}

    Probe 2,3,4,5

    SELECT count(stable_24_yn) as stable_24_yn_YES FROM ${DKU_DATASET_TABLE_NAME}
    where stable_24_yn = 'Y';

    I can't seem to find any pattern in the metrics that calculate vs those that don't. I've noticed that the uncalculated metrics aren't updated with the newest datetime at Status > Metrics, and they don't show up in my Metrics dataset with the newest datetime timestamp.

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Options

    Hi,

    Could you please open a support ticket, attaching a diagnosis of the affected job (from the Job page, Actions > Download job diagnosis)?

    Cheers,

    Alex

Setup Info
    Tags
      Help me…