Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
we have an issue about computing the median on dataiku on large datasets. In particular, we do not know how to compute the median of columns in a dataset without using python or pyspark. Is there any method using recipes that can achieve this task, in an efficient way if possible? Thanks to everybody
Besides Python/Spark. If your datasets are SQL you can use SQL recipe.
For example ;
SELECT MEDIAN("tshirt_price") as "median_tshirt_price", SUM("tshirt_quantity") as "total_tshirt_quantity", COUNT(*) as "total_orders" FROM "PUBLIC"."table_name" WHERE "tshirt_price" IS NOT NULL