How to sync a partitioned dataset only for partitions not in the output
Alex_Combessie
Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
I have a sync recipe with one partitioned dataset in input and one partitioned dataset in output. Partitioning is by hour.
The input dataset receives new data continuously. Today I manually build the output recipe by selecting new dates, using the append instead of overwrite options.
This is obviously not optimal, as it involves manual intervention.
What would be a solution to only sync the partition from the input that are not in the output? (other than job scheduling, which could be too costly)
The input dataset receives new data continuously. Today I manually build the output recipe by selecting new dates, using the append instead of overwrite options.
This is obviously not optimal, as it involves manual intervention.
What would be a solution to only sync the partition from the input that are not in the output? (other than job scheduling, which could be too costly)
Tagged: