How to group by several things efficiently

FannyPicamal Registered Posts: 2

Hi all,

I analyse transaction data and I often need to break the metrics by category, while keeping the global value. I want to obtain rows that correspond to the metrics for a precise category of customer (region for example).

Currently I use two different Group recipes to obtain either the values grouped by regions or the global metrics (grouped by nothing) and then stack them. Yet I have sometimes a lot more categories than just regions (channels, brands...) and in these case my previous solution lead to several different group recipes (I believe 2^n group recipes where n is the number of category) which lead to frequent fails in the build of the datasets and a flow difficult to read.

I believe there must be a more efficient way to do the same thing.

I tried using Window recipe but it adds colums and I prefer rows and the Distinct aggregation isn't available (I use it quite often). I also want my individual transaction rows to disappear after I grouped them.

So two questions : Can you see a more efficient way to complete this task with the basic recipes? Can the failed builds originate from something else?

Thank you for your help!


  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,728 Neuron

    What is your use case exactly? If you are going to show the data in a Dataiku Dashboard then Dataiku Insights might be a better solution.

  • FannyPicamal
    FannyPicamal Registered Posts: 2


    I don't use Dataiku for visualisation so I don't use Dataiku dashboards. I only need to share the final table in an excel file.

    I start from a transaction table, where each row is a transaction and each transaction is assigned to a few categories (like region, channel, brand...) and have some metrics (amount, number of units purchased...)

    In the end, I want a table where the avg/sum of the metrics is computed for each combination of category, so for the transaction from region X, channel Y and brand Z, but also for the transaction from channel Y and any region or brand, and any other combinations.

    Currently I use 2^n group recipes where n is the number of categories, which is heavy, and I am looking for a way to compute this final table more efficiently.

    I hope this is clearer

    Thank you!

Setup Info
      Help me…