Ready for Dataiku 10? Try out the Crash Course on new features!GET STARTED

Sync only new records

sylvyr3
Level 3
Sync only new records

Is there a 'simple' way to use a sync recipe or other recipe to transfer only the new records from a source table?

I haven't fleshed out this entire thought yet but my table does have an identity column so that makes things slightly easier:

  1. Create python script to grab max(identity) from destination table
  2. Update global variable with latest max(identity)
  3. Have a MS SQL server dataset in SQL Query mode with a where clause specifying that the source identity > max(identity) from destination table
  4. Now sync (append instead of overwrite) to destination table

Let me know if there is another way that I'm not aware of to accomplish the same.  

Thanks!

3 Replies
tgb417
Neuron
Neuron

@sylvyr3 

An interesting question.

You seem to be focusing on the part of the problem that is about adding new records.  And that is an important part of many scenarios.

However, when working through similar problems I also find myself also thinking about the questions of deleted records and modified records.  There are some situations like financial records and logs where existing records are not likely to change.  However, there are lots of scenarios like Customer Directories where existing records are both modified and deleted.  In those cases, an incremental addition can work.

Before going down the road you are laying out, I invite you to consider if either of these two additional aspects will be part of your situation?

 

--Tom
0 Kudos
sylvyr3
Level 3
Author

Thanks for the input.  Fortunately the application that populates the table only inserts records and does not modify or delete records. 

Manuel
Dataiker
Dataiker

Hi,

You should look into the partitions functionality, which enables you to process just certain chunks of data. In your case, you would need your new records to be identified by a unique partition id, perhaps the insert date.

Watch this video from the academy to get started on partitions:

I hope this helps.

Best regards

Setup info

?
A banner prompting to get Dataiku DSS