Window recipe return wrong results
Hi everyone,
I've this issue, well I think might be an issue unless we reject H0
If you go trought hands On Window Recipe via the link https://academy.dataiku.com/visual-recipes-102/671236 after "chapter : Apply a Post-Filter in a Visual Recipe" where you start exploring your dataset and I came across with this values which do not match and I don't understand why. If you please check the 3 screenshots as attached files One screenshot with good results and the 2nd and 3rd showing wrong results. The min, max and avg values do not correspondent to the computation from the data displayed in the table unless it's doing calculations with data from somewhere else (e.g. screenshot 1, card_purchase_amount_min = 4.6 and card_purchase_amount_max = 945 ?!? where those values came from ? Only the first record, which is the first Card Id is correct 32.99 . Is this a bug or am I misunderstanding something?
Answers
-
Hi Carl,
The behavior you were seeing is due to the default sampling setting in explorer. While the aggregated values by window recipe were calculated on the whole data, the explorer only shows the first 10000 records by default. So that it can appear “incorrect” as you reported.
To verify this, for example, you can change the sampling setting to only show the records where card_id is C_ID_6e530c88db as shown in the screenshot below. This way, you should be able to see all the records for this particular card_id in this case and confirm min being 4.6 and max being 945 etc. This should work for the other card_id, too.
For more details of sampling in explore, please refer to the following documentation.
https://doc.dataiku.com/dss/latest/explore/sampling.html