SQL Compilation for Date-Partitioned Datasets for recipes that require all Partitions

Pete_Berry_Snap
Pete_Berry_Snap Registered Posts: 1

Hi, this is my first time working with partitions so please let me know if I'm making a mistake.

I have partitioned various datasets by a date column (set partition timeframe to DAY) and have set things up so that my scheduled scenarios for initial recipes are only doing transformations on the latest few days worth of data in order to reduce computation.

There is a point in my flow however, where a recipe depends on the full dataset (all the partitions by day). (This is a group recipe that makes aggregates for each customer across the full timeframe of sales).

When in the visual recipe, I get the option to choose which input partitions are required to build the output dataset. Regardless if I select 'all available', or do a custom date range encompassing the earliest point in my data until now, the SQL that is being executed looks like this:

image.png

This is clearly super inefficient and takes much longer to run. The workaround I have implemented is to just switch to a SQL recipe and remove the date range condition in the WHERE clause. I was just wondering if I am missing a trick or misunderstanding something.

Thanks.

Answers

Setup Info
    Tags
      Help me…