Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi everyone,
we have an issue about computing the median on dataiku on large datasets. In particular, we do not know how to compute the median of columns in a dataset without using python or pyspark. Is there any method using recipes that can achieve this task, in an efficient way if possible? Thanks to everybody
Hi,
Besides Python/Spark. If your datasets are SQL you can use SQL recipe.
For example ;
SELECT
MEDIAN("tshirt_price") as "median_tshirt_price",
SUM("tshirt_quantity") as "total_tshirt_quantity",
COUNT(*) as "total_orders"
FROM "PUBLIC"."table_name"
WHERE "tshirt_price" IS NOT NULL
result :
Thanks Alex, but our datasets are not SQL