We're excited to announce that we're launching the second installment of Dataiku Product Days Register Now

Multi-column pivots - aggregate mode

0 Kudos

Currently, multi-column pivots create columns for every combination of values across all selected pivot columns. But my use-case is different- I'd like just one column per pivot column value, providing aggregates for each value independently. This would be helpful for feature generation, where I'm interested in representing the distribution of a particular dimension for each record. This effect can currently be achieved by creating one pivot recipe per pivot column, then joining them all together on the row identifiers.

1 Comment
natejgardner
Level 6

Here's an example of the current workaround:Screenshot 2021-05-19 184559.jpg

I create multiple pivot recipes to generate my aggregates, then join the results back together to create one table with all the fields I want.

It would be great if the pivot recipe could handle this directly. I think it'd be really useful for creating a normalized set of features to train ML models on.