SQL OVER(PARTITION BY ... ORDER BY partition_key) on partitioned dataset

julesbertrand · May 2022

Hi,

I have a dataset partitioned by month_key (string type, not date), e.g. 202112 for December 2021.

I am working on customer data, and I want to flag changes from month to month in a column.

Original table is like

client_id	month_key (partition key)	client_attribute
000000001	202110	A
000000001	202111	A
000000001	202112	B
000000001	202201	B

And I want:

client_id	month_key (partition key)	client_attribute_change_flag
000000001	202110	False
000000001	202111	False
000000001	202112	True
000000001	202201	False

I successfully used an SQL OVER clause on an unpartitioned version of the dataset:

client_attribute != LAG(client_attribute) OVER (PARTITION BY `client_id` ORDER BY `month_key`) AS client_attribute_change_flag

Now it is partitioned, i can't re use the same recipe AS I don't have access to month_key and month_key - 1. Do you have any advice on how to do this ? Should I temporarily un partition the dataset, compute my flag column and join back with my partitioned dataset ?

Thanks

Jules

SQL OVER(PARTITION BY ... ORDER BY partition_key) on partitioned dataset

Categories

Setup Info

Tags