Custom Aggregation(string_agg)

Solved!
bhakuniv
Level 2
Custom Aggregation(string_agg)

Hi

I am trying to get the following output using the group recipe, for a given ID the output should should be mapped the factor with the maximum score, if more than one factor associated with an ID has the same score and that is the maximum, the output should be have the concatenated value. Tried you custom aggregation (string_agg) is there a way I can specify the max score condition to get the desired output.

Note: My dataset is file based

Input

IDFactorScore
1A50
1B50
1C40
2B50
2C40
3C40

Output

IDFactorScore
1A,B50
2B50
3C40

 

Thanks

VB

0 Kudos
1 Solution
fchataigner2
Dataiker

Hi,

the problem of computing an argmax() boils down to whether the SQL dialect or SparkSQL can offer it, and the answer is often no.

In your case, you should do:

- first a Window recipe, partitioned by ID, sorted by Score, with a unlimited window frame (window frame activated, no upper nor lower limit) and compute the rank aggregate

- filter the rows with rank 1 (either as a post filter in the window recipe or as a pre filter in the grouping)

- group by ID with a concat aggregate

 

Regards,

   Frederic

View solution in original post

0 Kudos
2 Replies
fchataigner2
Dataiker

Hi,

the problem of computing an argmax() boils down to whether the SQL dialect or SparkSQL can offer it, and the answer is often no.

In your case, you should do:

- first a Window recipe, partitioned by ID, sorted by Score, with a unlimited window frame (window frame activated, no upper nor lower limit) and compute the rank aggregate

- filter the rows with rank 1 (either as a post filter in the window recipe or as a pre filter in the grouping)

- group by ID with a concat aggregate

 

Regards,

   Frederic

0 Kudos
bhakuniv
Level 2
Author
Thanks Frederic this works