I suggest Dataiku add a feature to calculate median easily, although it is available inside 'Analyze' for numeric columns.
But if someone wants to find the median of each group after grouping, like we can find min, max & avg, etc.
Adding a median feature after grouping data will definitely help the user by giving them a smooth experience.
Hi @PANKAJ ,
Thanks for submitting this idea. You'll be pleased to hear this idea is in our backlog--adding median to the list of available aggregation--and we are determining the next steps for development. We can't provide a timeline at this point, but be sure to check back for updates.
For everyone else, kudos the original post to signal that you're interested in Dataiku developing and releasing this feature!
@AshleyW & @PANKAJ ,
If we are going to add Median, what about Mode as well?
Second, while we are waiting for this feature.
Has anyone worked out the syntax to add these calculations to a window recipe as a custom aggregation? I'm running Postgres as the underlying DBMS.
As long as we're adding averages, geometric and harmonic means and RMS would be nice to have also.
@tgb417 for Postgres, try
percentile_cont(0.5) WITHIN GROUP (ORDER BY num_value) as median_value,
mode() WITHIN GROUP (ORDER BY num_value) as modal_value
For geometric and harmonic means in Postgres, see this link
I've figured out how to do an Average as a custom aggregation using the following.
Custom aggregation using the following SQL AVG(CAST("track" AS REAL)) OVER $window
Note that one needs to drop the explicit column alias
If you call out the column name in the custom in the SQL chunk you provide for the custom aggregation you break things.
However, I've not yet figured out how to do the precentile_con(0.5) and mode(). For example, if I try something like this:
Trying to create a custom Agrigation for Mode produces error "Invalid custom aggregation: ERROR: WITHIN GROUP is required for ordered-set aggregate mode Position: 16"
I get an error message. In a detailed search of the dataiku.com website, I've not found a solution.
@tgb417 it looks like Postgres doesn't directly support these as window functions, but you can work around it with a join: https://stackoverflow.com/questions/39683330/percentile-calculation-with-a-window-function
Another workaround :
1. In the group by you can concatenate the values of interest.
2. Then in a prepare recipe, with a python step, you can transform this concatenation into a python list (with library ast), then compute median on this list (with library statistics)
I'm guessing that Median is "hard" because it always seems to be overlooked. I'd go further than just Median though and add in things like P5, P10, P25 etc for the various important percentiles.