Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

Find count of consecutive data

deep_215
Level 2
Find count of consecutive data

Hi,

I need to find the occurrence of a value. Here i need to count the absent days for each employee. The counter rests if there is any present in between. For example, for 101 employee, counter starts once he is absent, but it is reset to 0 if he is present. and counter again start for absent days. 

Thanks in advance. Below is the sample dataset. Here COUNT is desired output column.

EMP_IDATTENDANCECOUNT
101Present0
101Absent1
101Absent2
101Absent3
101Present0
101Absent1
101Absent2
102Present0
102Present0
102Absent1
102Absent2
102Present0
0 Kudos
5 Replies
Turribeach

On what technology is the dataset stored? You would normaly use a Window Recipe for this kind of aggregate calculation but I don't see a way to reset the count when the same employee is present/absent multiple times. So assuming you are in SQL you can use analytical functions to calculate your table as shown in this SO post.

0 Kudos

I think its possible actually! I implemented the solution in this SO post using two window recipes. 

You need to have a column to sort the Absent/Present days however. @deep_215 I assume you have a column with the day the employee is Absent/Present? 

    • The first Window recipe implements the CTE. The window settings and custom aggregation code are: 
      image.png

 

 

image.png

 

 

  • The second Window recipe is just a ROW_NUMBER - 1 so that the count starts at 1 and not 2:
    image.png




    image.png




    These are the results I get:
    image.png
Turribeach

This is a good try but I don't think it's correct. I couldn't use custom aggregations on my tests since I didn't have SQL on my environment, now I do. I think the logic doesn't work when first employee record is an absense:

Untitled.png

 

 

So I got it working in all cases but I needed 3 recipes. The first window recipe is the same as yours. The second window recipe I partitioned it as follows:

Capture.PNG
Capture2.PNG

Finally in a prepare recipe I fix the count for Present with a formula:

Capture3.PNG

Final results:

Capture4.PNG

 

But I would personally use a SQL recipe since it will be done in a single step much faster.