Compute median without code recipe

malalearning
malalearning Registered Posts: 7

Hi everyone,

we have an issue about computing the median on dataiku on large datasets. In particular, we do not know how to compute the median of columns in a dataset without using python or pyspark. Is there any method using recipes that can achieve this task, in an efficient way if possible? Thanks to everybody

Tagged:

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
    edited July 17

    Hi,
    Besides Python/Spark. If your datasets are SQL you can use SQL recipe.

    For example ;

    SELECT 
        MEDIAN("tshirt_price") as "median_tshirt_price",
        SUM("tshirt_quantity") as "total_tshirt_quantity",
        COUNT(*) as "total_orders"
    FROM "PUBLIC"."table_name"
    WHERE "tshirt_price" IS NOT NULL


    result :
    Screen Shot 2023-04-05 at 11.45.02 AM.png

  • malalearning
    malalearning Registered Posts: 7

    Thanks Alex, but our datasets are not SQL

Setup Info
    Tags
      Help me…