How to filter the max date in the column using prepare recipe
Hi,
I would like to filter or flag the max date in the column using the prepare recipe.
For example:
Below is my dataset. I would like to filter or flag the max date or latest date in the column "date" and show the relevant values from other columns like below.
Code | ID | Date | |
BT01 | A | 2022-04-30T00:00:00.000Z | |
BT01 | d | 2022-04-30T00:00:00.000Z | |
BT01 | e | 2022-03-30T00:00:00.000Z | |
BT01 | f | 2022-04-30T00:00:00.000Z | |
BT01 | g | 2022-05-30T00:00:00.000Z | |
BT01 | h | 2022-05-30T00:00:00.000Z |
Expected out put:
Code | ID | Date |
BT01 | g | 2022-05-30T00:00:00.000Z |
BT01 | h | 2022-05-30T00:00:00.000Z |
Currently i am using the topn recipe to identify the max date and doing Join with the previous dataset to fetch the other columns. Felt this was the long route to achieve this result. requesting for experts comments.
Thanks,
Prabhakaran
Operating system used: Windows
Answers
-
Miguel Angel Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 118 Dataiker
If you know what is the maximum/latest date you are looking for, you can use the "Filter rows/cells on date" processor.
However, if you do not know what is the value of the max on that colum, then the prepare recipe is not the best option. This is because (most of) its processors operate on a row by row basis. The kind of aggregation needed to calculate a max, min, avg, etc is not available for this recipe unless the values are contained in the same row.
Hope this helps.