Survey banner
Switching to Dataiku - a new area to help users who are transitioning from other tools and diving into Dataiku!

# How to calculate day difference between rows with specific conditions

Level 2
###### How to calculate day difference between rows with specific conditions

Hello, I have a dataset like below and want to calculate the "CV" day difference since "Visit" per "ID", as the "CV_days_after_visit" in the expected result.

Is it possible to do by using the visual recipes of Dataiku (without coding)?

Existing dataset:

 ID Action date A2A Visit 2023/4/1 A2A CV 2023/4/5 A2A CV 2023/4/7 BB3 Visit 2023/5/5 BB3 Visit 2023/5/29 BB3 CV 2023/5/30 5EE Visit 2023/6/1 5EE CV 2023/6/10

Expected results:

 ID Action date CV_days_after_visit (expected result) A2A Visit 2023/4/1 A2A CV 2023/4/5 4 A2A CV 2023/4/7 6 BB3 Visit 2023/5/5 BB3 Visit 2023/5/29 BB3 CV 2023/5/30 1 5EE Visit 2023/6/1 5EE CV 2023/6/10 9
5 Replies
Dataiker

Hi,
You can use a Visual Window recipe :
https://knowledge.dataiku.com/latest/data-preparation/visual-recipes/concept-window-recipe.html
Make sure to parse the date first and set the order column partition column, window definition, and then Lagdiff - to days.

Thanks

Level 2
Author

Hi, AlexT

I did try the Window recipe with the same setting, but the Lagdiff is calculated based on the previous 1 row only.

As shown in my dataset, the CV and the last Visit dates are not necessarily next to each other...

Dataiker

Indeed, I don't see a way to do this from the Visual recipe alone.

I might try the following, breaking down the problem into smaller more solvable steps.

0. First Iโd need to parse the dates into a form on which I can do calculations and joins

1. Use a split recipie to break up the dataset into a Visit and CV data set.

2. Find candidate visit and CV pairs.  Likely using a left join to connect the visit data set with the cv data sets using date.  However not using Equal but greater than or equal on the date.  So we are joining each visit to all CV after the Visit.  This produces a long list of possible CV for a visit.

3. Then Iโd use a window function to order all of the visit CV sets, and adding a row number.  I might also use a prepare or filter recipe prior to this to remove items that are not appropriate.

4. Then Iโd filter this large set of candidate CVs to the right one.  Likely the most recent

5. Calculate the time difference.

This may not do exactly what you want to do.  The idea is break up the dataset into two data sets, Visit and CV.  Join the two datasets to find the candidate pairs.  And eliminate the ones that are not useful.

Then stack the Visit set and this new CV data set to get your resulting set.

Once one gets it working in the broken down approach, then, Iโd consider if there are ways to simplify.

Good luck with this.  Although you are not likely to be able to do this with a single Dataiku visual recipie step.  I do think you can make the transformation you are trying to achieve.  Over the years, Iโve done a number of projects like this with Dataiku DSS.

Good luck, let us know how you get one with this.

--Tom

So I got it working but you need a Window Recipe and a Prepare Recipe. First the window recipe:

Nothing on Aggregations as we need a Custom Aggregation:

The output has now a new Visit_Date column:

Finally we use a Prepare recipe with a If => Then processor to calculate the difference in days:

PS: Custom Aggregations are not supported in the DSS engine so make sure you have your datasets in a SQL database.