Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi Dataiku Team,
I have dataset thats get updated every hour. This dataset is not-partitioned. I would like to load the new partition(hourly) every hour.
I have done the following
1- Sync the source dataset to destination dataset
2- Partition the destination dataset by timestamp
3- Editing the sync recipe and use redispatch partition because the source is not partitioned
4- Build the dataset.
Every things looks good and the dataset got partitioned correctly. The issue comes when I try to get the next hour partition.
I have created scenario and create a build step with PREVIOUS_HOUR partition. The issue, DSS is trying to reload all the dataset again and repatriation all partition again.
How can I get only next hour partition ?
Hi @Bader, do you have some screenshots of the settings on the sync recipe related to input-output? Something similar to what is shown in the screenshot?
I think this could help to have some idea at what might be happening
Hi @Ignacio_Toledo ,
Thanks for your comment. I have fixed the issue by doing below steps:
Lets assume we have source dataset called "dataset_A" (Not partition dataset)
1- Sync the source dataset to destination dataset "dataset_B"
2- Partition the destination dataset "dataset_B" by timestamp
3- Editing the sync recipe and use redispatch partition because the source "dataset_A" is not partitioned
4- Build the dataset "dataset_B"
Now, The dataset "dataset_B" is partitioned
5- Create sync recipe the source dataset is "dataset_B", the destination is "dataset_C"
6- Build "dataset_C" with all partition data
7- Create scenario and create a build step with PREVIOUS_HOUR partition
I hope the steps are clear.