Custom Aggregation(string_agg)

bhakuniv
bhakuniv Registered Posts: 9 ✭✭✭✭

Hi

I am trying to get the following output using the group recipe, for a given ID the output should should be mapped the factor with the maximum score, if more than one factor associated with an ID has the same score and that is the maximum, the output should be have the concatenated value. Tried you custom aggregation (string_agg) is there a way I can specify the max score condition to get the desired output.

Note: My dataset is file based

Input

IDFactorScore
1A50
1B50
1C40
2B50
2C40
3C40

Output

IDFactorScore
1A,B50
2B50
3C40

Thanks

VB

Best Answer

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Answer ✓

    Hi,

    the problem of computing an argmax() boils down to whether the SQL dialect or SparkSQL can offer it, and the answer is often no.

    In your case, you should do:

    - first a Window recipe, partitioned by ID, sorted by Score, with a unlimited window frame (window frame activated, no upper nor lower limit) and compute the rank aggregate

    - filter the rows with rank 1 (either as a post filter in the window recipe or as a pre filter in the grouping)

    - group by ID with a concat aggregate

    Regards,

    Frederic

Answers

Setup Info
    Tags
      Help me…