Partitioning dataset & getting next hour partition

Bader · October 2020

Hi Dataiku Team,

I have dataset thats get updated every hour. This dataset is not-partitioned. I would like to load the new partition(hourly) every hour.

I have done the following

1- Sync the source dataset to destination dataset

2- Partition the destination dataset by timestamp

3- Editing the sync recipe and use redispatch partition because the source is not partitioned

4- Build the dataset.

Every things looks good and the dataset got partitioned correctly. The issue comes when I try to get the next hour partition.

I have created scenario and create a build step with PREVIOUS_HOUR partition. The issue, DSS is trying to reload all the dataset again and repatriation all partition again.

How can I get only next hour partition ?

Ignacio_Toledo · October 2020

Hi @Bader
, do you have some screenshots of the settings on the sync recipe related to input-output? Something similar to what is shown in the screenshot?

I think this could help to have some idea at what might be happening

Bader · October 2020

Hi @Ignacio_Toledo
,

Thanks for your comment. I have fixed the issue by doing below steps:

Lets assume we have source dataset called "dataset_A" (Not partition dataset)

1- Sync the source dataset to destination dataset "dataset_B"

2- Partition the destination dataset "dataset_B" by timestamp

3- Editing the sync recipe and use redispatch partition because the source "dataset_A" is not partitioned

4- Build the dataset "dataset_B"

Now, The dataset "dataset_B" is partitioned

5- Create sync recipe the source dataset is "dataset_B", the destination is "dataset_C"

6- Build "dataset_C" with all partition data

7- Create scenario and create a build step with PREVIOUS_HOUR partition

I hope the steps are clear.

Thanks

Kind regards

Ignacio_Toledo · October 2020

The steps are clear @Bader
, thanks. However you say "I've fixed the issue", does this mean that the problem is actually solved then and the scenario working as you expect?

Bader · October 2020

Yes

Ignacio_Toledo · October 2020

OK, good to know then! Thanks!

Partitioning dataset & getting next hour partition

Answers

Categories

Setup Info

Tags