Join us, on May 27th, for an introduction to the new Dataiku Academy Learn more

How to sync a partitioned dataset only for partitions not in the output

Dataiker
Dataiker
How to sync a partitioned dataset only for partitions not in the output
I have a sync recipe with one partitioned dataset in input and one partitioned dataset in output. Partitioning is by hour.

The input dataset receives new data continuously. Today I manually build the output recipe by selecting new dates, using the append instead of overwrite options.

This is obviously not optimal, as it involves manual intervention.

What would be a solution to only sync the partition from the input that are not in the output? (other than job scheduling, which could be too costly)
0 Kudos
0 Replies
Labels (2)