Unexpected results with "trunc" function

NikolayK
NikolayK Partner, Registered Posts: 14 Partner

It's probably rather a bug report than a discussion, but there are chances I'm doing something wrong.

I'm calling trunc(MyDateColumn, 'd') on a column and for "2021-10-31T22:00:00.000Z" it returns "2021-10-31T01:00:00.000Z". Obviously, I expect to get "2021-10-31T00:00:00.000Z".

Tagged:

Answers

  • Catalina
    Catalina Dataiker, Dataiku DSS Core Designer, Registered Posts: 135 Dataiker

    Hi @NikolayK
    ,

    I did a quick test using DSS version 11.0.1 and I'm not able to reproduce this.

    Screenshot 2022-09-06 at 09.27.06.png

    Can you share more details? Is this issue happening for all the dates of the column MyDateColumn or only for this specific date? Where are you running the function trunc(MyDateColumn, 'd')?

  • NikolayK
    NikolayK Partner, Registered Posts: 14 Partner

    I did some more extensive testing and the issue seems to be quite subtle. Here's what I did:

    1. If I use trunc() in a Prepare recipe, it works fine.

    2. It breaks when trunc() is used for a pre-computed column in a Join recipe. What is more, the incumbent dataset is stored in S3 (Parquet), and the other joined dataset is stored in PostgreSQL (not sure if it has any significance). Most of the dates are truncated properly, there are only a few problems.

    Let me know if you need more details.

  • NikolayK
    NikolayK Partner, Registered Posts: 14 Partner

    Hi @CatalinaS
    . After some more work on it, it seems that it's the Join recipe that has a problem. Without any pre-computed columns, it changes "2021-10-31T00:00:00.000Z" to "2021-10-31T01:00:00.000Z".

  • Catalina
    Catalina Dataiker, Dataiku DSS Core Designer, Registered Posts: 135 Dataiker

    Thanks for the details provided! I was able to reproduce the issue. Indeed the input date "2021-10-31T00:00:00.000Z" is changed to "2021-10-31T01:00:00.000Z" after a join recipe. The issue doesn't happen if the datetime column is set to be a string rather than a date.

    One solution to this is to force the working timezone for the java processes of DSS to UTC and retry. To do this, follow Adding additional Java options and create or add a javaopts section with :
    [javaopts]
    jek.additional.opts = -Duser.timezone="UTC"
    ​A run of bin/dssadmin regenerate-config and a restart of DSS will be needed afterwards.
  • NikolayK
    NikolayK Partner, Registered Posts: 14 Partner

    Thanks @CatalinaS
    ! Do you have more details about the circumstances under which this problem may occur, so that I can assess the impact?

  • NikolayK
    NikolayK Partner, Registered Posts: 14 Partner

    @CatalinaS
    , also, is this a bug that you plan to fix in future releases?

Setup Info
    Tags
      Help me…