Median on grouping data

I suggest Dataiku add a feature to calculate median easily, although it is available inside 'Analyze' for numeric columns.

But if someone wants to find the median of each group after grouping, like we can find min, max & avg, etc.

Adding a median feature after grouping data will definitely help the user by giving them a smooth experience.

8 Comments
AshleyW
Dataiker

 

Hi @PANKAJ ,

Thanks for submitting this idea. You'll be pleased to hear this idea is in our backlog--adding median to the list of available aggregation--and we are determining the next steps for development. We can't provide a timeline at this point, but be sure to check back for updates.

For everyone else, kudos the original post to signal that you're interested in Dataiku developing and releasing this feature!

Take care,

Ashley

Status changed to: In Backlog

 

Hi @PANKAJ ,

Thanks for submitting this idea. You'll be pleased to hear this idea is in our backlog--adding median to the list of available aggregation--and we are determining the next steps for development. We can't provide a timeline at this point, but be sure to check back for updates.

For everyone else, kudos the original post to signal that you're interested in Dataiku developing and releasing this feature!

Take care,

Ashley

@AshleyW & @PANKAJ ,

If we are going to add Median, what about Mode as well?

Second, while we are waiting for this feature. 

Has anyone worked out the syntax to add these calculations to a window recipe as a custom aggregation?  I'm running Postgres as the underlying DBMS.

--Tom

@AshleyW & @PANKAJ ,

If we are going to add Median, what about Mode as well?

Second, while we are waiting for this feature. 

Has anyone worked out the syntax to add these calculations to a window recipe as a custom aggregation?  I'm running Postgres as the underlying DBMS.

As long as we're adding averages, geometric and harmonic means and RMS would be nice to have also.

@tgb417 for Postgres, try

percentile_cont(0.5) WITHIN GROUP (ORDER BY num_value) as median_value,
mode() WITHIN GROUP (ORDER BY num_value) as modal_value

 

For geometric and harmonic means in Postgres, see this link

As long as we're adding averages, geometric and harmonic means and RMS would be nice to have also.

@tgb417 for Postgres, try

percentile_cont(0.5) WITHIN GROUP (ORDER BY num_value) as median_value,
mode() WITHIN GROUP (ORDER BY num_value) as modal_value

 

For geometric and harmonic means in Postgres, see this link

@natejgardner ,

I've figured out how to do an Average as a custom aggregation using the following.

Custom aggregation using the following SQL AVG(CAST("track" AS REAL)) OVER $windowCustom aggregation using the following SQL AVG(CAST("track" AS REAL)) OVER $window

 

Note that one needs to drop the explicit column alias

as custome_AVG_1

If you call out the column name in the custom in the SQL chunk you provide for the custom aggregation you break things.

However, I've not yet figured out how to do the precentile_con(0.5) and mode().  For example, if I try something like this:

Trying to create a custom Agrigation for Mode produces error "Invalid custom aggregation: ERROR: WITHIN GROUP is required for ordered-set aggregate mode   Position: 16"Trying to create a custom Agrigation for Mode produces error "Invalid custom aggregation: ERROR: WITHIN GROUP is required for ordered-set aggregate mode Position: 16"

I get an error message.  In a detailed search of the dataiku.com website, I've not found a solution.

 

--Tom

@natejgardner ,

I've figured out how to do an Average as a custom aggregation using the following.

Custom aggregation using the following SQL AVG(CAST("track" AS REAL)) OVER $windowCustom aggregation using the following SQL AVG(CAST("track" AS REAL)) OVER $window

 

Note that one needs to drop the explicit column alias

as custome_AVG_1

If you call out the column name in the custom in the SQL chunk you provide for the custom aggregation you break things.

However, I've not yet figured out how to do the precentile_con(0.5) and mode().  For example, if I try something like this:

Trying to create a custom Agrigation for Mode produces error "Invalid custom aggregation: ERROR: WITHIN GROUP is required for ordered-set aggregate mode   Position: 16"Trying to create a custom Agrigation for Mode produces error "Invalid custom aggregation: ERROR: WITHIN GROUP is required for ordered-set aggregate mode Position: 16"

I get an error message.  In a detailed search of the dataiku.com website, I've not found a solution.

 

@tgb417 it looks like Postgres doesn't directly support these as window functions, but you can work around it with a join: https://stackoverflow.com/questions/39683330/percentile-calculation-with-a-window-function

@tgb417 it looks like Postgres doesn't directly support these as window functions, but you can work around it with a join: https://stackoverflow.com/questions/39683330/percentile-calculation-with-a-window-function

Hayet
Level 1

Another workaround :

1. In the group by you can concatenate the values of interest.

2. Then in a prepare recipe, with a python step, you can transform this concatenation into a python list (with library ast), then compute median on this list (with library statistics)

Another workaround :

1. In the group by you can concatenate the values of interest.

2. Then in a prepare recipe, with a python step, you can transform this concatenation into a python list (with library ast), then compute median on this list (with library statistics)

9squirrels
Level 1

I'm guessing that Median is "hard" because it always seems to be overlooked. I'd go further than just Median though and add in things like P5, P10, P25 etc for the various important percentiles.

I'm guessing that Median is "hard" because it always seems to be overlooked. I'd go further than just Median though and add in things like P5, P10, P25 etc for the various important percentiles.

MichaelG
Community Manager
Community Manager
 
I hope I helped! Do you Know that if I was Useful to you or Did something Outstanding you can Show your appreciation by giving me a KUDOS?

Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as โ€˜Accepted Solutionโ€™ to help others like you!
Status changed to: In the Backlog