Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi everyone, I want to create an incremental load in dataiku. For example, I have a transactional DB and need to move data in an ETL process to an Analytical DB, but I just want to read from de transactional DB the rows based on the Last_modified Date field. I don't want to read the entire Database each time that I run de process, just the rows that fulfill the condition.
Operating system used: Linux
Hello,
You can use a time-based partition to only run data based on a specific date (or other dimension). See an example from our Knowledge base here.
Hope this helps!
Ok, great alternative, but if I want to use a value from another dataset. For example, if I first validate the max Create_time field in my Analytical Database to know from where I need to get data and use that date to only run data based on that specific date?
If you want to use a value from another dataset then use a join recipe.
Also project variables can be used as partition identifiers!