We're excited to announce that we're launching the second installment of Dataiku Product Days Register Now

Median on grouping data

I suggest Dataiku add a feature to calculate median easily, although it is available inside 'Analyze' for numeric columns.

But if someone wants to find the median of each group after grouping, like we can find min, max & avg, etc.

Adding a median feature after grouping data will definitely help the user by giving them a smooth experience.

6 Comments
AshleyW
Dataiker
Dataiker
Status changed to: In Backlog

 

Hi @PANKAJ ,

Thanks for submitting this idea. You'll be pleased to hear this idea is in our backlog--adding median to the list of available aggregation--and we are determining the next steps for development. We can't provide a timeline at this point, but be sure to check back for updates.

For everyone else, kudos the original post to signal that you're interested in Dataiku developing and releasing this feature!

Take care,

Ashley

tgb417
Neuron
Neuron

@AshleyW & @PANKAJ ,

If we are going to add Median, what about Mode as well?

Second, while we are waiting for this feature. 

Has anyone worked out the syntax to add these calculations to a window recipe as a custom aggregation?  I'm running Postgres as the underlying DBMS.

natejgardner
Level 6

As long as we're adding averages, geometric and harmonic means and RMS would be nice to have also.

@tgb417 for Postgres, try

percentile_cont(0.5) WITHIN GROUP (ORDER BY num_value) as median_value,
mode() WITHIN GROUP (ORDER BY num_value) as modal_value

 

For geometric and harmonic means in Postgres, see this link

tgb417
Neuron
Neuron

@natejgardner ,

I've figured out how to do an Average as a custom aggregation using the following.

Custom aggregation using the following SQL AVG(CAST("track" AS REAL)) OVER $windowCustom aggregation using the following SQL AVG(CAST("track" AS REAL)) OVER $window

 

Note that one needs to drop the explicit column alias

as custome_AVG_1

If you call out the column name in the custom in the SQL chunk you provide for the custom aggregation you break things.

However, I've not yet figured out how to do the precentile_con(0.5) and mode().  For example, if I try something like this:

Trying to create a custom Agrigation for Mode produces error "Invalid custom aggregation: ERROR: WITHIN GROUP is required for ordered-set aggregate mode   Position: 16"Trying to create a custom Agrigation for Mode produces error "Invalid custom aggregation: ERROR: WITHIN GROUP is required for ordered-set aggregate mode Position: 16"

I get an error message.  In a detailed search of the dataiku.com website, I've not found a solution.

 

natejgardner
Level 6

@tgb417 it looks like Postgres doesn't directly support these as window functions, but you can work around it with a join: https://stackoverflow.com/questions/39683330/percentile-calculation-with-a-window-function

Hayet
Level 1

Another workaround :

1. In the group by you can concatenate the values of interest.

2. Then in a prepare recipe, with a python step, you can transform this concatenation into a python list (with library ast), then compute median on this list (with library statistics)