Incremental Load in Dataiku

Options
AHerrera101499
AHerrera101499 Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 14

Hi everyone, I want to create an incremental load in dataiku. For example, I have a transactional DB and need to move data in an ETL process to an Analytical DB, but I just want to read from de transactional DB the rows based on the Last_modified Date field. I don't want to read the entire Database each time that I run de process, just the rows that fulfill the condition.


Operating system used: Linux

Answers

  • AmandaM
    AmandaM Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 9 Dataiker
    Options

    Hello,

    You can use a time-based partition to only run data based on a specific date (or other dimension). See an example from our Knowledge base here.

    Hope this helps!

  • AHerrera101499
    AHerrera101499 Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 14
    Options

    Ok, great alternative, but if I want to use a value from another dataset. For example, if I first validate the max Create_time field in my Analytical Database to know from where I need to get data and use that date to only run data based on that specific date?

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,727 Neuron
    Options

    If you want to use a value from another dataset then use a join recipe.

  • AmandaM
    AmandaM Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 9 Dataiker
    Options

    Also project variables can be used as partition identifiers!

Setup Info
    Tags
      Help me…