Updating existing dataset

Options
sagar_dubey
sagar_dubey Partner, Registered Posts: 17 Partner

While Ingesting new data, is there any code/visual recipe we can update the old records and append new records based on a particular column(metric) value?

In Input dataset, we have 3 records and in the next run we get 3 more records out of which 1 is same as before, 1 is the updated records and 1 is the new record.

In resultset, we want the new record to get append and update the old record and keep the same record as it is.

Answers

  • Tuong-Vi
    Tuong-Vi Partner, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Neuron 2020, Dataiku DSS Adv Designer, Registered, Neuron 2021, Neuron 2022 Posts: 33 Partner
    Options

    Hello @sagar_dubey
    ,

    I have tried to add current date column by block, and I have done window recipie with partition by id, and select max date in aggregations :

    win1.PNG

    In the post filter I have set : date is the same as date_max

    In the output, I have the data I want :

    win2.PNG

    I don't know if it can help you, maybe there is easier way to solve this use case by script or partitionning...

  • emate
    emate Dataiku DSS Core Designer, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 91 ✭✭✭✭✭✭
    Options

    Hi @sagar_dubey
    @Tuong-Vi

    Intresting case Do you think my approach would work? This might look a bit odd but I think it is working.

    Basically I created 2 excels, named them the same, and I am doing a join on the same file, kind of.... joining the publish excel and excel in the folder I am uploading with the same name, it updated the price for rows that already existed and added new one.

    input_1.pnginput2.pngflow.pngjoin1.pngjoin2.pngoutput.png

    Mateusz

  • Tuong-Vi
    Tuong-Vi Partner, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Neuron 2020, Dataiku DSS Adv Designer, Registered, Neuron 2021, Neuron 2022 Posts: 33 Partner
    Options

    Hello,

    I think it works too, but insert time dimension seems to be important in order to select the id's latest version ...(it depends on how data are stored) ?

  • emate
    emate Dataiku DSS Core Designer, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 91 ✭✭✭✭✭✭
    Options

    You are right all depends on how real data looks like

Setup Info
    Tags
      Help me…