Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi,
I have a dataset partitioned by month_key (string type, not date), e.g. 202112 for December 2021.
I am working on customer data, and I want to flag changes from month to month in a column.
Original table is like
client_id | month_key (partition key) | client_attribute |
000000001 | 202110 | A |
000000001 | 202111 | A |
000000001 | 202112 | B |
000000001 | 202201 | B |
And I want:
client_id | month_key (partition key) | client_attribute_change_flag |
000000001 | 202110 | False |
000000001 | 202111 | False |
000000001 | 202112 | True |
000000001 | 202201 | False |
I successfully used an SQL OVER clause on an unpartitioned version of the dataset:
client_attribute != LAG(client_attribute) OVER (PARTITION BY `client_id` ORDER BY `month_key`) AS client_attribute_change_flag
Now it is partitioned, i can't re use the same recipe AS I don't have access to month_key and month_key - 1. Do you have any advice on how to do this ? Should I temporarily un partition the dataset, compute my flag column and join back with my partitioned dataset ?
Thanks
Jules