I'm trying to create an efficient probe where I can compute multiple metrics for multiple columns in the dataset. For simplicity, let's say that I want a count of all 0's, 1's, and 2's.
With perColumn and singleAggregate set to true, I can create a probe that counts zeroes:
probe.json
{
"meta" : {
"label" : "Number of zeroes",
"description" : "Number of zeroes",
"icon" : "icon-puzzle-piece"
},
"restrictToDatasetType" : "",
"canHive" : true,
"canImpala" : true,
"canSpark" : true,
"singleAggregate" : true,
"perColumn" : true
}
probe.sql
SUM(CASE WHEN ${column} = 0 THEN 1 ELSE 0 END)
However, I don't want to create another probe to count the 1's, I want the same probe to tackle that calculation. My guess is to set singleAggregate to false, but then I can't figure out how to write the query. The following is my current attempt:
probe.sql
SELECT
SUM(CASE WHEN ${column} = 0 THEN 1 ELSE 0 END) AS zero_count,
SUM(CASE WHEN ${column} = 1 THEN 1 ELSE 0 END) AS one_count
FROM ${DKU_DATASET_TABLE_NAME} ${DKU_WHERE_PARTITION_FILTER}
Running this probe throws an error: Incorrect syntax near '{'.
Hi,
the mode in which you can have the same probe applied to several column is only possible when there is a single aggregate, so what you're trying to build is not possible.
Regards,
Frederic
Ah, okay. So if I would write different probes for each individual metric, would Dataiku calculate all these probes in one pass on the table, or execute a separate pass for each probe?
Hi,
when you have activated metric probes on a dataset, you can get a preview of the "computation runs" that DSS will do at the top of the Status > Edit > Metrics page (the little → is actually a button with a dropdown). For instance, your metrics should end up on the same computation run and thus be a single query to the database.
Note that you can cheat you way out by building the aggregates list in a custom UI (see https://doc.dataiku.com/dss/latest/plugins/reference/other.html#custom-settings-ui 😞 make a JS controller that takes the form values to build an aggregates property in the probe's config, and use it in a simple SQL probe like
SELECT ${aggregates} FROM ${DKU_DATASET_TABLE_NAME} ${DKU_WHERE_PARTITION_FILTER}
Regards,
Frederic