count duplicate recipe window

Solved!
OlivierAb
Level 3
count duplicate recipe window

Hello,

i would like to count duplicate (column "ident_2_count") in the column "ident_2" with a window recipe 

In my example, i excepted to have "2" and"2" in column "ident_2_count" instead of "1" and "2"

screenshot_1.png

I can work around the problem by making a "group" recipe then a "join" recipe, but why does this not work with the "window" rectte?
I noticed that I don't have the same results if I use the DSS engine (cumulative distribution is always 1) or the spark engine (cumulative distribution is 0.5 and 1 if there are 2 duplicates; cumulative distribution is 0.3, 0.6 and 1 if there are 3 duplicates)

Thank you, Olivier

0 Kudos
1 Solution
ZachM
Dataiker

Hi @OlivierAb,

You're likely receiving this error because of the window-frame configuration in your Window recipe. By default, Window recipes only take preceding rows into consideration when calculating aggregations, which is why it appears to be counting one-by-one.

If you want it to give the total count on every row, you can configure your window frame so that it has no limits set.

Here's screenshots of my Window recipe configuration:

329EEFDA-1F60-41AB-8959-EF13811DA982.png

 CF40A47A-DF0A-43E4-8DD2-6DCFFA01C014.png

 

And here's a screenshot of my output dataset:

E37F459B-1E57-4C01-83CD-C59D56D20BE3.png

 

 

If changing the Window recipe configuration doesn't resolve the issue for you, could you please provide screenshots of your Window recipe configuration?

Thanks,

Zach

View solution in original post

2 Replies
ZachM
Dataiker

Hi @OlivierAb,

You're likely receiving this error because of the window-frame configuration in your Window recipe. By default, Window recipes only take preceding rows into consideration when calculating aggregations, which is why it appears to be counting one-by-one.

If you want it to give the total count on every row, you can configure your window frame so that it has no limits set.

Here's screenshots of my Window recipe configuration:

329EEFDA-1F60-41AB-8959-EF13811DA982.png

 CF40A47A-DF0A-43E4-8DD2-6DCFFA01C014.png

 

And here's a screenshot of my output dataset:

E37F459B-1E57-4C01-83CD-C59D56D20BE3.png

 

 

If changing the Window recipe configuration doesn't resolve the issue for you, could you please provide screenshots of your Window recipe configuration?

Thanks,

Zach

OlivierAb
Level 3
Author

it works perfectly  🙂

Thanks ZachM !