I use a Twitter connector to collect tweets. By standard, the twitter-data is partitioned by the hour in Dataiku. In my dataflow I want to use a lower granularity and partition by day (or week / month / ...), e.g. to calculate the sentiment of all the tweets per day (week / month). This can be done by using a Time Range when syncing the twitter-data to a new (output) dataset. However, sometimes there are no tweets detected in a particular hour and the partition for that hour does not exist in the input. When I try to run the sync-recipe in this case, this results in an error saying the partition in question is missing and processing stops and the output dataset is not filled. How can I sync when not all input-partitions that should make up the output-partition, are available?
Peter van Klaveren