How to group by several things efficiently
Hi all,
I analyse transaction data and I often need to break the metrics by category, while keeping the global value. I want to obtain rows that correspond to the metrics for a precise category of customer (region for example).
Currently I use two different Group recipes to obtain either the values grouped by regions or the global metrics (grouped by nothing) and then stack them. Yet I have sometimes a lot more categories than just regions (channels, brands...) and in these case my previous solution lead to several different group recipes (I believe 2^n group recipes where n is the number of category) which lead to frequent fails in the build of the datasets and a flow difficult to read.
I believe there must be a more efficient way to do the same thing.
I tried using Window recipe but it adds colums and I prefer rows and the Distinct aggregation isn't available (I use it quite often). I also want my individual transaction rows to disappear after I grouped them.
So two questions : Can you see a more efficient way to complete this task with the basic recipes? Can the failed builds originate from something else?
Thank you for your help!
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,166 Neuron
What is your use case exactly? If you are going to show the data in a Dataiku Dashboard then Dataiku Insights might be a better solution.
https://doc.dataiku.com/dss/latest/dashboards/insights/index.html
-
Hi,
I don't use Dataiku for visualisation so I don't use Dataiku dashboards. I only need to share the final table in an excel file.
I start from a transaction table, where each row is a transaction and each transaction is assigned to a few categories (like region, channel, brand...) and have some metrics (amount, number of units purchased...)
In the end, I want a table where the avg/sum of the metrics is computed for each combination of category, so for the transaction from region X, channel Y and brand Z, but also for the transaction from channel Y and any region or brand, and any other combinations.
Currently I use 2^n group recipes where n is the number of categories, which is heavy, and I am looking for a way to compute this final table more efficiently.
I hope this is clearer
Thank you!