Updating existing dataset
While Ingesting new data, is there any code/visual recipe we can update the old records and append new records based on a particular column(metric) value?
In Input dataset, we have 3 records and in the next run we get 3 more records out of which 1 is same as before, 1 is the updated records and 1 is the new record.
In resultset, we want the new record to get append and update the old record and keep the same record as it is.
Answers
-
Tuong-Vi Partner, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Neuron 2020, Dataiku DSS Adv Designer, Registered, Neuron 2021, Neuron 2022 Posts: 33 Partner
Hello @sagar_dubey
,I have tried to add current date column by block, and I have done window recipie with partition by id, and select max date in aggregations :
In the post filter I have set : date is the same as date_max
In the output, I have the data I want :
I don't know if it can help you, maybe there is easier way to solve this use case by script or partitionning...
-
Mateusz Dataiku DSS Core Designer, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 91 ✭✭✭✭✭✭
Intresting case
Do you think my approach would work? This might look a bit odd but I think it is working. Basically I created 2 excels, named them the same, and I am doing a join on the same file, kind of.... joining the publish excel and excel in the folder I am uploading with the same name, it updated the price for rows that already existed and added new one.
Mateusz
-
Tuong-Vi Partner, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Neuron 2020, Dataiku DSS Adv Designer, Registered, Neuron 2021, Neuron 2022 Posts: 33 Partner
Hello,
I think it works too, but insert time dimension seems to be important in order to select the id's latest version ...(it depends on how data are stored) ?
-
Mateusz Dataiku DSS Core Designer, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 91 ✭✭✭✭✭✭
You are right
all depends on how real data looks like