Submit your inspiring success story or innovative use case to the 2022 Dataiku Frontrunner Awards! ENTER YOUR SUBMISSION

SQL OVER(PARTITION BY ... ORDER BY partition_key) on partitioned dataset

julesbertrand
Level 2
Level 2
SQL OVER(PARTITION BY ... ORDER BY partition_key) on partitioned dataset

Hi, 

I have a dataset partitioned by month_key (string type, not date), e.g. 202112 for December 2021.

I am working on customer data, and I want to flag changes from month to month in a column.

Original table is like

client_idmonth_key (partition key)client_attribute
000000001202110A
000000001202111A
000000001202112B
000000001202201B

 

And I want: 

client_idmonth_key (partition key)client_attribute_change_flag
000000001202110False
000000001202111False
000000001202112True
000000001202201False

 

I successfully used an SQL OVER clause on an unpartitioned version of the dataset:

client_attribute != LAG(client_attribute) OVER (PARTITION BY `client_id` ORDER BY `month_key`) AS client_attribute_change_flag

Now it is partitioned, i can't re use the same recipe AS I don't have access to month_key and month_key - 1. Do you have any advice on how to do this ? Should I temporarily un partition the dataset, compute my flag column and join back with my partitioned dataset ?

Thanks

Jules

 

0 Kudos
0 Replies