Partitioning dataset & getting next hour partition

Options
Bader
Bader Registered Posts: 46 ✭✭✭✭✭

Hi Dataiku Team,

I have dataset thats get updated every hour. This dataset is not-partitioned. I would like to load the new partition(hourly) every hour.

I have done the following

1- Sync the source dataset to destination dataset

2- Partition the destination dataset by timestamp

3- Editing the sync recipe and use redispatch partition because the source is not partitioned

4- Build the dataset.

Every things looks good and the dataset got partitioned correctly. The issue comes when I try to get the next hour partition.

I have created scenario and create a build step with PREVIOUS_HOUR partition. The issue, DSS is trying to reload all the dataset again and repatriation all partition again.

How can I get only next hour partition ?

Tagged:

Answers

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 411 Neuron
    Options

    Hi @Bader
    , do you have some screenshots of the settings on the sync recipe related to input-output? Something similar to what is shown in the screenshot?

    Selection_357.png

    I think this could help to have some idea at what might be happening

  • Bader
    Bader Registered Posts: 46 ✭✭✭✭✭
    Options

    Hi @Ignacio_Toledo
    ,

    Thanks for your comment. I have fixed the issue by doing below steps:

    Lets assume we have source dataset called "dataset_A" (Not partition dataset)

    1- Sync the source dataset to destination dataset "dataset_B"

    2- Partition the destination dataset "dataset_B" by timestamp

    3- Editing the sync recipe and use redispatch partition because the source "dataset_A" is not partitioned

    4- Build the dataset "dataset_B"

    Now, The dataset "dataset_B" is partitioned

    5- Create sync recipe the source dataset is "dataset_B", the destination is "dataset_C"

    6- Build "dataset_C" with all partition data

    7- Create scenario and create a build step with PREVIOUS_HOUR partition

    I hope the steps are clear.

    Thanks

    Kind regards

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 411 Neuron
    Options

    The steps are clear @Bader
    , thanks. However you say "I've fixed the issue", does this mean that the problem is actually solved then and the scenario working as you expect?

  • Bader
    Bader Registered Posts: 46 ✭✭✭✭✭
    Options
  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 411 Neuron
    Options

    OK, good to know then! Thanks!

Setup Info
    Tags
      Help me…