calculate the days between the 2 date columns excluding weekends and holidays

Tsurapaneni
Tsurapaneni Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 41 ✭✭✭✭

Hi Team,

I have a use case where I to calculate the number of days between the 2 date columns (eg: start date, end date). For this use case the processor "Compute time differences " helped. But, when there is an additional condition added to this like excluding weekend and holidays (holidays are listed in the dataset separately, I thought the processor "flag holidays" would help but, it is taking only the input column which is either the start date or the end date but not the date range which are between them.

NOTE: Please consider the limitations of the engine: "In-SQL database engine" as I am currently using this engine to implement this solution.

Best,

Thanks !

Answers

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker

    Hi,

    that operation isn't translatable as a SQL expression, so you'll have to either code it yourself in python (or another language), or compute the day count using a Join+Group approach:

    • make a dataset with the 'accepted' days (not weekend, not holidays) over a time range covering your data, and have the dataset in the SQL db
    • join the main dataset with the accepted days dataset, on start_date <= accepted_day <= end_date
    • group on something that identifies each row of the main dataset, and count accepted_day
  • Tsurapaneni
    Tsurapaneni Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 41 ✭✭✭✭
    edited July 17

    Hi,

    I didn't understand what are those "accepted days (not weekend, not holidays)" means here. Can you please help me to understand this.

    I am trying to implement the solution using the python code recipe but I am getting a warning as "

    DtypeWarning: Columns (33) have mixed types. Specify dtype option on import or set low_memory=False.
      if self.run_code(code, result):

    Not sure on how to overcome this. Probably, because of this it is leading to a very slow writing which took about more than 20 hours. Can you help me on how to overcome this issue. if that's the read_csv then we can use the low_memory = False option within the read_csv but while reading the recipe inputs we use dataiku.Dataset and it is not taking this argument low_memory.

    Thanks !!

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker

    attached an example project (done on a 8.0.5) with the idea. It's not using SQL but if you have a SQL db, you can put all_days_prepared on the db.

  • Tsurapaneni
    Tsurapaneni Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 41 ✭✭✭✭

    Hi,

    Do you have any solution for the above python error I have mentioned in the last post. I think that might help me for now.

    Can you let me know which one has the solution out of all the files you have shared. I couldn't find the one which has the solution to this problem it's just there to compute all days but not the business days.

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker

    it's impossible to say anything about your python error/problem with so little information. The full python code and the error, or a full diagnostic of the failing job would be necessary.

    the zip file is a DSS project (ie open in DSS with "New Project > Import project)

  • Tsurapaneni
    Tsurapaneni Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 41 ✭✭✭✭

    Hi,

    This project gave me a partial idea but as the engine used here is local stream and not the in database sql one it broke the chain there and I implemented in python but having difficulty in the join function as it is not showing that as an invalid join for me. do you have anything that works in python to find the values between the date range of 2 columns in one dataframe and check if they are present in another dataframe date column.

Setup Info
    Tags
      Help me…