Find count of consecutive data
Hi,
I need to find the occurrence of a value. Here i need to count the absent days for each employee. The counter rests if there is any present in between. For example, for 101 employee, counter starts once he is absent, but it is reset to 0 if he is present. and counter again start for absent days.
Thanks in advance. Below is the sample dataset. Here COUNT is desired output column.
EMP_ID | ATTENDANCE | COUNT |
101 | Present | 0 |
101 | Absent | 1 |
101 | Absent | 2 |
101 | Absent | 3 |
101 | Present | 0 |
101 | Absent | 1 |
101 | Absent | 2 |
102 | Present | 0 |
102 | Present | 0 |
102 | Absent | 1 |
102 | Absent | 2 |
102 | Present | 0 |
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,172 Neuron
On what technology is the dataset stored? You would normaly use a Window Recipe for this kind of aggregate calculation but I don't see a way to reset the count when the same employee is present/absent multiple times. So assuming you are in SQL you can use analytical functions to calculate your table as shown in this SO post.
-
LouisDHulst Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Neuron, Registered, Neuron 2023 Posts: 54 Neuron
I think its possible actually! I implemented the solution in this SO post using two window recipes.
You need to have a column to sort the Absent/Present days however. @deep_215
I assume you have a column with the day the employee is Absent/Present?- The first Window recipe implements the CTE. The window settings and custom aggregation code are:
- The second Window recipe is just a ROW_NUMBER - 1 so that the count starts at 1 and not 2:
These are the results I get:
- The first Window recipe implements the CTE. The window settings and custom aggregation code are:
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,172 Neuron
This is a good try but I don't think it's correct. I couldn't use custom aggregations on my tests since I didn't have SQL on my environment, now I do. I think the logic doesn't work when first employee record is an absense:
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,172 Neuron
So I got it working in all cases but I needed 3 recipes. The first window recipe is the same as yours. The second window recipe I partitioned it as follows:
Finally in a prepare recipe I fix the count for Present with a formula:
Final results:
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,172 Neuron
But I would personally use a SQL recipe since it will be done in a single step much faster.