Median of a column in Dataiku

Partner, L2 Admin, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 26 Partner

Is there any way to find median of a column of a dataset in Dataiku?

Answers

  • Dataiker Posts: 355 Dataiker
    edited July 2024

    Hi

    if the dataset fits into memory, the easiest is to use python, for something like

    df = dataiku.Dataset("the_dataset_name").get_dataframe()
    m = df["the_column_name"].median()

    For larger datasets, you can make a Window recipe, set the window to be ordered on the column, activate the window frame but leave limit preceding and limit following unchecked, tick the "cumulative distreibution" aggregate, and filter + use a TopN recipe to get the first row with a cumedist value above 0.5

    If the question is about enriching each row with the median of the column, then you'll have to sync the data to a SQL database and use SQL to perform the operation.

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.