How to shift data in a column
I have a main question and as part of my solution to it I have a follow up question:
Main question:
I have a column in my time series data (let's call it status) that is populated with on/ off binary data.
I need to find a way to create a column to count the days since last time status was on. so basically when the status is on the counter sets to zero and for all the off statuses after, the counter will count the days and resets to zero when status is on next. I know how I can do it with a for loop in my dataframe if I was using pandas. but I don't know how to do it in dataiku.
Follow up question:
As part of the way I am thinking to do it is to use the formula for a cell in the prepare recipe. but for this I need to know the status of last day in the status column. so I basically need to shift the status column by one day to be able to get that as part of my (def process(row)) function code.
Operating system used: Windows
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,167 Neuron
Both your main question and follow up question can be solved using a Window recipe:
https://knowledge.dataiku.com/latest/data-preparation/visual-recipes/tutorial-window-recipe.html
If you can't figure out how to partition the data attach an Excel file with some dummy data.
-
Here is the data.
so the time is in 5 min increments. I want to have a cloumn for the number of "days" since last time valve was open.
in this data first open is in line 169 and bunch of opens in the same day (Sep 28 2020) . then it goes to line 58788 when the next open is in April 20 2021. so the new column should be zero for all rows in Sep 28 2020 and goes up by one each day till 211 for April 19 2021 and then zero again April 20