Find count of consecutive data

Options
deep_215
deep_215 Registered Posts: 9 ✭✭✭

Hi,

I need to find the occurrence of a value. Here i need to count the absent days for each employee. The counter rests if there is any present in between. For example, for 101 employee, counter starts once he is absent, but it is reset to 0 if he is present. and counter again start for absent days.

Thanks in advance. Below is the sample dataset. Here COUNT is desired output column.

EMP_IDATTENDANCECOUNT
101Present0
101Absent1
101Absent2
101Absent3
101Present0
101Absent1
101Absent2
102Present0
102Present0
102Absent1
102Absent2
102Present0

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,737 Neuron
    Options

    On what technology is the dataset stored? You would normaly use a Window Recipe for this kind of aggregate calculation but I don't see a way to reset the count when the same employee is present/absent multiple times. So assuming you are in SQL you can use analytical functions to calculate your table as shown in this SO post.

  • LouisDHulst
    LouisDHulst Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Neuron, Registered, Neuron 2023 Posts: 44 Neuron
    Options

    I think its possible actually! I implemented the solution in this SO post using two window recipes.

    You need to have a column to sort the Absent/Present days however. @deep_215
    I assume you have a column with the day the employee is Absent/Present?

      • The first Window recipe implements the CTE. The window settings and custom aggregation code are:
        image.png

    image.png

    • The second Window recipe is just a ROW_NUMBER - 1 so that the count starts at 1 and not 2:
      image.png




      image.png




      These are the results I get:
      image.png
  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,737 Neuron
    Options

    This is a good try but I don't think it's correct. I couldn't use custom aggregations on my tests since I didn't have SQL on my environment, now I do. I think the logic doesn't work when first employee record is an absense:

    Untitled.png

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,737 Neuron
    Options

    So I got it working in all cases but I needed 3 recipes. The first window recipe is the same as yours. The second window recipe I partitioned it as follows:

    Capture.PNG
    Capture2.PNG

    Finally in a prepare recipe I fix the count for Present with a formula:

    Capture3.PNG

    Final results:

    Capture4.PNG

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,737 Neuron
    Options

    But I would personally use a SQL recipe since it will be done in a single step much faster.

Setup Info
    Tags
      Help me…