Sync only new records

Options
sylvyr3
sylvyr3 Registered Posts: 21 ✭✭✭✭

Is there a 'simple' way to use a sync recipe or other recipe to transfer only the new records from a source table?

I haven't fleshed out this entire thought yet but my table does have an identity column so that makes things slightly easier:

  1. Create python script to grab max(identity) from destination table
  2. Update global variable with latest max(identity)
  3. Have a MS SQL server dataset in SQL Query mode with a where clause specifying that the source identity > max(identity) from destination table
  4. Now sync (append instead of overwrite) to destination table

Let me know if there is another way that I'm not aware of to accomplish the same.

Thanks!

Answers

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @sylvyr3

    An interesting question.

    You seem to be focusing on the part of the problem that is about adding new records. And that is an important part of many scenarios.

    However, when working through similar problems I also find myself also thinking about the questions of deleted records and modified records. There are some situations like financial records and logs where existing records are not likely to change. However, there are lots of scenarios like Customer Directories where existing records are both modified and deleted. In those cases, an incremental addition can work.

    Before going down the road you are laying out, I invite you to consider if either of these two additional aspects will be part of your situation?

  • sylvyr3
    sylvyr3 Registered Posts: 21 ✭✭✭✭
    Options

    Thanks for the input. Fortunately the application that populates the table only inserts records and does not modify or delete records.

  • Manuel
    Manuel Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 193 ✭✭✭✭✭✭✭
    Options

    Hi,

    You should look into the partitions functionality, which enables you to process just certain chunks of data. In your case, you would need your new records to be identified by a unique partition id, perhaps the insert date.

    Watch this video from the academy to get started on partitions:

    //play.vidyard.com/w7ozrFT3REzedHvv6hKriJ.html?

    I hope this helps.

    Best regards

Setup Info
    Tags
      Help me…