count duplicate recipe window

OlivierAb
OlivierAb Registered Posts: 15 ✭✭✭✭

Hello,

i would like to count duplicate (column "ident_2_count") in the column "ident_2" with a window recipe

In my example, i excepted to have "2" and"2" in column "ident_2_count" instead of "1" and "2"

screenshot_1.png

I can work around the problem by making a "group" recipe then a "join" recipe, but why does this not work with the "window" rectte?
I noticed that I don't have the same results if I use the DSS engine (cumulative distribution is always 1) or the spark engine (cumulative distribution is 0.5 and 1 if there are 2 duplicates; cumulative distribution is 0.3, 0.6 and 1 if there are 3 duplicates)

Thank you, Olivier

Best Answer

  • Zach
    Zach Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 153 Dataiker
    Answer ✓

    Hi @OlivierAb
    ,

    You're likely receiving this error because of the window-frame configuration in your Window recipe. By default, Window recipes only take preceding rows into consideration when calculating aggregations, which is why it appears to be counting one-by-one.

    If you want it to give the total count on every row, you can configure your window frame so that it has no limits set.

    Here's screenshots of my Window recipe configuration:

    329EEFDA-1F60-41AB-8959-EF13811DA982.png

    CF40A47A-DF0A-43E4-8DD2-6DCFFA01C014.png

    And here's a screenshot of my output dataset:

    E37F459B-1E57-4C01-83CD-C59D56D20BE3.png

    If changing the Window recipe configuration doesn't resolve the issue for you, could you please provide screenshots of your Window recipe configuration?

    Thanks,

    Zach

Answers

Setup Info
    Tags
      Help me…