Multi-column pivots - aggregate mode

natejgardner · ‎05-12-2021

Currently, multi-column pivots create columns for every combination of values across all selected pivot columns. But my use-case is different- I'd like just one column per pivot column value, providing aggregates for each value independently. This would be helpful for feature generation, where I'm interested in representing the distribution of a particular dimension for each record. This effect can currently be achieved by creating one pivot recipe per pivot column, then joining them all together on the row identifiers.

natejgardner · ‎05-20-2021

Here's an example of the current workaround:

I create multiple pivot recipes to generate my aggregates, then join the results back together to create one table with all the fields I want.

It would be great if the pivot recipe could handle this directly. I think it'd be really useful for creating a normalized set of features to train ML models on.

Multi-column pivots - aggregate mode

Labels

Data Exploration and Preparation

Programmatic Git Support (Shell, Python API or Both)

Method to re-order V12 Visual ML override rules

Labeling > Support providing Annotations as optional Input

Filtering Editable Datasets