Groupby and transform
I have a dataset with USER and ITEMS. I wish to perform a groupby and use size() or count() aggregation to find the count in each group. But I wish to then create a new column in the original dataset with this count. Using pandas, this is accomplished with the `transform` method. How is this done in Dataiku using any of the recipes (hopefully, the prepare recipe). Thanks.
Operating system used: mac ventura
Best Answer
-
Miguel Angel Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 118 Dataiker
Hi Erlebacher,
The Prepare Recipe is primarily used for row operations, though there are some processors which can do operations across rows. In order to do aggregations, the Group or Window Recipes are more appropiate. Both can do counts and other aggregations out of the box. Moreover, you can write your own custom aggregations.
Regarding performing transformations in the original dataset, this is rather counter-intuitive to the way the Flow is laid out. At times, you can still power through and set the dataset to the output of a recipe point to the same data location as the recipe feed. However, overlapping datasets can potentially cause problems to the Flow's lineage.