Median of a column in Dataiku

PANKAJ
PANKAJ Partner, L2 Admin, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 26 Partner

Is there any way to find median of a column of a dataset in Dataiku?

Answers

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    edited July 17

    Hi

    if the dataset fits into memory, the easiest is to use python, for something like

    df = dataiku.Dataset("the_dataset_name").get_dataframe()
    m = df["the_column_name"].median()

    For larger datasets, you can make a Window recipe, set the window to be ordered on the column, activate the window frame but leave limit preceding and limit following unchecked, tick the "cumulative distreibution" aggregate, and filter + use a TopN recipe to get the first row with a cumedist value above 0.5

    If the question is about enriching each row with the median of the column, then you'll have to sync the data to a SQL database and use SQL to perform the operation.

Setup Info
    Tags
      Help me…