Median on grouping data
I suggest Dataiku add a feature to calculate median easily, although it is available inside 'Analyze' for numeric columns.
But if someone wants to find the median of each group after grouping, like we can find min, max & avg, etc.
Adding a median feature after grouping data will definitely help the user by giving them a smooth experience.
Comments
-
Ashley Dataiker, Alpha Tester, Dataiku DSS Core Designer, Registered, Product Ideas Manager Posts: 162 Dataiker
Hi @PANKAJ
,Thanks for submitting this idea. You'll be pleased to hear this idea is in our backlog--adding median to the list of available aggregation--and we are determining the next steps for development. We can't provide a timeline at this point, but be sure to check back for updates.
For everyone else, kudos the original post to signal that you're interested in Dataiku developing and releasing this feature!
Take care,
Ashley
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
If we are going to add Median, what about Mode as well?
Second, while we are waiting for this feature.
Has anyone worked out the syntax to add these calculations to a window recipe as a custom aggregation? I'm running Postgres as the underlying DBMS.
-
As long as we're adding averages, geometric and harmonic means and RMS would be nice to have also.
@tgb417
for Postgres, trypercentile_cont(0.5) WITHIN GROUP (ORDER BY num_value) as median_value, mode() WITHIN GROUP (ORDER BY num_value) as modal_value
For geometric and harmonic means in Postgres, see this link
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
I've figured out how to do an Average as a custom aggregation using the following.
Note that one needs to drop the explicit column alias
as custome_AVG_1
If you call out the column name in the custom in the SQL chunk you provide for the custom aggregation you break things.
However, I've not yet figured out how to do the precentile_con(0.5) and mode(). For example, if I try something like this:
I get an error message. In a detailed search of the dataiku.com website, I've not found a solution.
-
@tgb417
it looks like Postgres doesn't directly support these as window functions, but you can work around it with a join: https://stackoverflow.com/questions/39683330/percentile-calculation-with-a-window-function -
Another workaround :
1. In the group by you can concatenate the values of interest.
2. Then in a prepare recipe, with a python step, you can transform this concatenation into a python list (with library ast), then compute median on this list (with library statistics)
-
I'm guessing that Median is "hard" because it always seems to be overlooked. I'd go further than just Median though and add in things like P5, P10, P25 etc for the various important percentiles.