How to group by several things efficiently

FannyPicamal · ‎02-22-2024

Hi all,

I analyse transaction data and I often need to break the metrics by category, while keeping the global value. I want to obtain rows that correspond to the metrics for a precise category of customer (region for example).

Currently I use two different Group recipes to obtain either the values grouped by regions or the global metrics (grouped by nothing) and then stack them. Yet I have sometimes a lot more categories than just regions (channels, brands...) and in these case my previous solution lead to several different group recipes (I believe 2^n group recipes where n is the number of category) which lead to frequent fails in the build of the datasets and a flow difficult to read.

I believe there must be a more efficient way to do the same thing.

I tried using Window recipe but it adds colums and I prefer rows and the Distinct aggregation isn't available (I use it quite often). I also want my individual transaction rows to disappear after I grouped them.

So two questions : Can you see a more efficient way to complete this task with the basic recipes? Can the failed builds originate from something else?

Thank you for your help!

Turribeach · ‎02-22-2024

What is your use case exactly? If you are going to show the data in a Dataiku Dashboard then Dataiku Insights might be a better solution.

https://doc.dataiku.com/dss/latest/dashboards/insights/index.html

FannyPicamal · ‎02-27-2024

Hi,

I don't use Dataiku for visualisation so I don't use Dataiku dashboards. I only need to share the final table in an excel file.

I start from a transaction table, where each row is a transaction and each transaction is assigned to a few categories (like region, channel, brand...) and have some metrics (amount, number of units purchased...)

In the end, I want a table where the avg/sum of the metrics is computed for each combination of category, so for the transaction from region X, channel Y and brand Z, but also for the transaction from channel Y and any region or brand, and any other combinations.

Currently I use 2^n group recipes where n is the number of categories, which is heavy, and I am looking for a way to compute this final table more efficiently.

I hope this is clearer

Thank you!

Sign up to take part

How to group by several things efficiently

How to group by several things efficiently