Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I'm trying to create an efficient probe where I can compute multiple metrics for multiple columns in the dataset. For simplicity, let's say that I want a count of all 0's, 1's, and 2's.
With perColumn and singleAggregate set to true, I can create a probe that counts zeroes:
probe.json
{
"meta" : {
"label" : "Number of zeroes",
"description" : "Number of zeroes",
"icon" : "icon-puzzle-piece"
},
"restrictToDatasetType" : "",
"canHive" : true,
"canImpala" : true,
"canSpark" : true,
"singleAggregate" : true,
"perColumn" : true
}
probe.sql
SUM(CASE WHEN ${column} = 0 THEN 1 ELSE 0 END)
However, I don't want to create another probe to count the 1's, I want the same probe to tackle that calculation. My guess is to set singleAggregate to false, but then I can't figure out how to write the query. The following is my current attempt:
probe.sql
SELECT
SUM(CASE WHEN ${column} = 0 THEN 1 ELSE 0 END) AS zero_count,
SUM(CASE WHEN ${column} = 1 THEN 1 ELSE 0 END) AS one_count
FROM ${DKU_DATASET_TABLE_NAME} ${DKU_WHERE_PARTITION_FILTER}
Running this probe throws an error: Incorrect syntax near '{'.
Hi,
the mode in which you can have the same probe applied to several column is only possible when there is a single aggregate, so what you're trying to build is not possible.
Regards,
Frederic
Ah, okay. So if I would write different probes for each individual metric, would Dataiku calculate all these probes in one pass on the table, or execute a separate pass for each probe?
Hi,
when you have activated metric probes on a dataset, you can get a preview of the "computation runs" that DSS will do at the top of the Status > Edit > Metrics page (the little โ is actually a button with a dropdown). For instance, your metrics should end up on the same computation run and thus be a single query to the database.
Note that you can cheat you way out by building the aggregates list in a custom UI (see https://doc.dataiku.com/dss/latest/plugins/reference/other.html#custom-settings-ui ๐ make a JS controller that takes the form values to build an aggregates property in the probe's config, and use it in a simple SQL probe like
SELECT ${aggregates} FROM ${DKU_DATASET_TABLE_NAME} ${DKU_WHERE_PARTITION_FILTER}
Regards,
Frederic