SQL OVER(PARTITION BY ... ORDER BY partition_key) on partitioned dataset

julesbertrand
julesbertrand Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 9 Partner

Hi,

I have a dataset partitioned by month_key (string type, not date), e.g. 202112 for December 2021.

I am working on customer data, and I want to flag changes from month to month in a column.

Original table is like

client_idmonth_key (partition key)client_attribute
000000001202110A
000000001202111A
000000001202112B
000000001202201B

And I want:

client_idmonth_key (partition key)client_attribute_change_flag
000000001202110False
000000001202111False
000000001202112True
000000001202201False

I successfully used an SQL OVER clause on an unpartitioned version of the dataset:

client_attribute != LAG(client_attribute) OVER (PARTITION BY `client_id` ORDER BY `month_key`) AS client_attribute_change_flag

Now it is partitioned, i can't re use the same recipe AS I don't have access to month_key and month_key - 1. Do you have any advice on how to do this ? Should I temporarily un partition the dataset, compute my flag column and join back with my partitioned dataset ?

Thanks

Jules

Setup Info
    Tags
      Help me…