I am trying to get the following output using the group recipe, for a given ID the output should should be mapped the factor with the maximum score, if more than one factor associated with an ID has the same score and that is the maximum, the output should be have the concatenated value. Tried you custom aggregation (string_agg) is there a way I can specify the max score condition to get the desired output.
Note: My dataset is file based
the problem of computing an argmax() boils down to whether the SQL dialect or SparkSQL can offer it, and the answer is often no.
In your case, you should do:
- first a Window recipe, partitioned by ID, sorted by Score, with a unlimited window frame (window frame activated, no upper nor lower limit) and compute the rank aggregate
- filter the rows with rank 1 (either as a post filter in the window recipe or as a pre filter in the grouping)
- group by ID with a concat aggregate