Groupby and transform

Options
Erlebacher
Erlebacher Registered Posts: 82 ✭✭

I have a dataset with USER and ITEMS. I wish to perform a groupby and use size() or count() aggregation to find the count in each group. But I wish to then create a new column in the original dataset with this count. Using pandas, this is accomplished with the `transform` method. How is this done in Dataiku using any of the recipes (hopefully, the prepare recipe). Thanks.


Operating system used: mac ventura

Tagged:

Best Answer

  • Miguel Angel
    Miguel Angel Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 118 Dataiker
    Answer ✓
    Options

    Hi Erlebacher,

    The Prepare Recipe is primarily used for row operations, though there are some processors which can do operations across rows. In order to do aggregations, the Group or Window Recipes are more appropiate. Both can do counts and other aggregations out of the box. Moreover, you can write your own custom aggregations.

    Regarding performing transformations in the original dataset, this is rather counter-intuitive to the way the Flow is laid out. At times, you can still power through and set the dataset to the output of a recipe point to the same data location as the recipe feed. However, overlapping datasets can potentially cause problems to the Flow's lineage.

Setup Info
    Tags
      Help me…