How to dynamically output recent 3 days' data with partitioned dataset

AiYuan
AiYuan Registered Posts: 1

Hi all,

I am new to Dataiku world, I'd like to ask the right way to output data with specific time range with partitioning method.

The thing I want to do is: Dynamically build the recent 3-days data from input datasets. (Use Time Range day partition)

I've tested but the output seems to grow even more than I expect, and the time range remain the same as the input dataset.

My input is partitioned by day, and the py recipe setting is like below: 

My output is not a partitioned dataset (I've tried to make the output dataset has the same parition as the input but result remain the same. Append instead of overwrite also tried.)

image.png

Thanks for help!

Operating system used: Windows

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,322 Dataiker

    Hi,
    So, specifying partition dependencies and having an unpartitioned  output dataset should collect the last 3 days only.
    With append disabled, the dataset should not contain more than 3 days' worth of data if the partition is set up correctly on the input dataset.

    If your output is partitioned, then it will collect all 3 days for each day/partition in the output; this is present for data archival use cases, but unlikely what you are looking for here.

    https://knowledge.dataiku.com/latest/automation/partitioning/concept-redispatch.html#collecting-partitions

    Let me know if that help or if this is not the behavior you are observing.

    Thnaks

Setup Info
    Tags
      Help me…