The Dataiku Frontrunner Awards have just launched to recognize your achievements! Submit Your Entry

Updating existing dataset

sagar_dubey
Level 1
Updating existing dataset

While Ingesting new data, is there any code/visual recipe we can update the old records and append new records based on a particular column(metric) value?

 

In Input dataset, we have 3 records and in the next run we get 3 more records out of which 1 is same as before, 1 is the updated records and 1 is the new record.

In resultset, we want the new record to get append and update the old record and keep the same record as it is. 

0 Kudos
4 Replies
Tuong-Vi
Neuron
Neuron

Hello @sagar_dubey ,

I have tried to add current date column by block, and I have done window recipie with partition by id, and select max date in aggregations :

win1.PNG

In the post filter I have set : date is the same as date_max

In the output, I have the data I want :

win2.PNG

 

 

 

 

I don't know if it can help you, maybe there is easier way to solve this use case by script or partitionning...

0 Kudos
emate
Neuron
Neuron

Hi @sagar_dubey @Tuong-Vi 

Intresting case 😄 Do you think my approach would work? This might look a bit odd but I think it is working.

Basically I created 2 excels, named them the same, and I am doing a join on the same file, kind of.... joining the publish excel and excel in the folder I am uploading with the same name, it updated the price for rows that already existed and added new one.

input_1.pnginput2.pngflow.pngjoin1.pngjoin2.pngoutput.png

 

Mateusz

 

0 Kudos
Tuong-Vi
Neuron
Neuron

Hello,

I think it works too, but insert time dimension seems to be important in order to select the id's latest version ...(it depends on how data are stored) ?

0 Kudos
emate
Neuron
Neuron

You are right 🙂 all depends on how real data looks like 

0 Kudos
A banner prompting to get Dataiku DSS