Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on March 5, 2025 5:18AM
Likes: 0
Replies: 1
Hi all,
I am new to Dataiku world, I'd like to ask the right way to output data with specific time range with partitioning method.
The thing I want to do is: Dynamically build the recent 3-days data from input datasets. (Use Time Range day partition)
I've tested but the output seems to grow even more than I expect, and the time range remain the same as the input dataset.
My input is partitioned by day, and the py recipe setting is like below:
My output is not a partitioned dataset (I've tried to make the output dataset has the same parition as the input but result remain the same. Append instead of overwrite also tried.)
Thanks for help!
Operating system used: Windows
Hi,
So, specifying partition dependencies and having an unpartitioned output dataset should collect the last 3 days only.
With append disabled, the dataset should not contain more than 3 days' worth of data if the partition is set up correctly on the input dataset.
If your output is partitioned, then it will collect all 3 days for each day/partition in the output; this is present for data archival use cases, but unlikely what you are looking for here.
https://knowledge.dataiku.com/latest/automation/partitioning/concept-redispatch.html#collecting-partitions
Let me know if that help or if this is not the behavior you are observing.
Thnaks