Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Is there any way to find median of a column of a dataset in Dataiku?
Hi
if the dataset fits into memory, the easiest is to use python, for something like
df = dataiku.Dataset("the_dataset_name").get_dataframe()
m = df["the_column_name"].median()
For larger datasets, you can make a Window recipe, set the window to be ordered on the column, activate the window frame but leave limit preceding and limit following unchecked, tick the "cumulative distreibution" aggregate, and filter + use a TopN recipe to get the first row with a cumedist value above 0.5
If the question is about enriching each row with the median of the column, then you'll have to sync the data to a SQL database and use SQL to perform the operation.