How to dynamically output recent 3 days' data with partitioned dataset

Registered Posts: 1

Hi all,

I am new to Dataiku world, I'd like to ask the right way to output data with specific time range with partitioning method.

The thing I want to do is: Dynamically build the recent 3-days data from input datasets. (Use Time Range day partition)

I've tested but the output seems to grow even more than I expect, and the time range remain the same as the input dataset.

My input is partitioned by day, and the py recipe setting is like below: 

My output is not a partitioned dataset (I've tried to make the output dataset has the same parition as the input but result remain the same. Append instead of overwrite also tried.)

Thanks for help!

Operating system used: Windows

Answers

  • Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,270 Dataiker

    Hi,
    So, specifying partition dependencies and having an unpartitioned  output dataset should collect the last 3 days only.
    With append disabled, the dataset should not contain more than 3 days' worth of data if the partition is set up correctly on the input dataset.

    If your output is partitioned, then it will collect all 3 days for each day/partition in the output; this is present for data archival use cases, but unlikely what you are looking for here.

    https://knowledge.dataiku.com/latest/automation/partitioning/concept-redispatch.html#collecting-partitions

    Let me know if that help or if this is not the behavior you are observing.

    Thnaks

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.