count duplicate recipe window
Hello,
i would like to count duplicate (column "ident_2_count") in the column "ident_2" with a window recipe
In my example, i excepted to have "2" and"2" in column "ident_2_count" instead of "1" and "2"
I can work around the problem by making a "group" recipe then a "join" recipe, but why does this not work with the "window" rectte?
I noticed that I don't have the same results if I use the DSS engine (cumulative distribution is always 1) or the spark engine (cumulative distribution is 0.5 and 1 if there are 2 duplicates; cumulative distribution is 0.3, 0.6 and 1 if there are 3 duplicates)
Thank you, Olivier
Best Answer
-
Hi @OlivierAb
,You're likely receiving this error because of the window-frame configuration in your Window recipe. By default, Window recipes only take preceding rows into consideration when calculating aggregations, which is why it appears to be counting one-by-one.
If you want it to give the total count on every row, you can configure your window frame so that it has no limits set.
Here's screenshots of my Window recipe configuration:
And here's a screenshot of my output dataset:
If changing the Window recipe configuration doesn't resolve the issue for you, could you please provide screenshots of your Window recipe configuration?
Thanks,
Zach
Answers
-
it works perfectly
Thanks ZachM !