Dates imported from Excel as days since 01 JAN 1900

Options
JMH829
JMH829 Registered Posts: 5 ✭✭✭✭

Hi, I am a new user, and I am doing a proof-of-concept trial with Dataiku 7.0.0. I am importing data containing dates from Excel. The dates in the Excel cells are formatted to appear like this:

2019-01-01 00:37:55 UTC

The text that is entered to be displayed in the cell looks like this:

1/1/2019 12:37:55 AM

When I try to import the data into Dataiku, it interprets that date column as text and returns 43466.02633101852. I understand that is Excel's way of storing the date as the number of days since January 1, 1900. I can go into Excel, change the format of the date column to the custom format of mm/dd/yyyy hh:mm, save a new Excel file, and import that into Dataiku, and then Dataiku recognizes the date.

Questions:

1. Is there a way to coax Dataiku to read the date-time value as a date that can then be parsed, starting from the original Excel file?

2. If not, is there a function to create a date value from the "Date" column that Dataiku interpreted as the value 43466.02633101852?

Thank you for your help.

Best Answers

  • MickaelH
    MickaelH Dataiker, Registered Posts: 31 Dataiker
    edited July 17 Answer ✓
    Options

    Hi,

    Thanks for providing the sample file.

    Indeed DSS won't interpret the custom format "2019-01-01 00:37:55 UTC" as a valid date format (we are relying on the Apache POI librairies for that). So the available options I would suggest are the following:

    • either change the custom format in excel to be the default one for date-time as you already tried: i.e. "1/1/2019 0:37:55"
    • or if your excel document is only made of a single sheet, export the corresponding sheet in CSV instead and after import in DSS parse the date string thanks to "Parse date" step in a "Prepare recipe" where you'll be able to define the custom format
    • or convert, thanks to a "Formula" step in a Prepare recipe, the number to the corresponding date. Based on Dates And Times In Excel, the formula, in case the corresponding column is entitled "Date", would be:

    inc(inc(asDate("1900-01-01", "yyyy-MM-dd"), floor(Date)-2, "days"),floor(60*60*24*(Date-floor(Date))), "seconds")

    I hope this helps.

    Regards,

  • JMH829
    JMH829 Registered Posts: 5 ✭✭✭✭
    Answer ✓
    Options

    Thanks MIckael! For my application, it would be easy to change the format in Excel before trying to import the data. However, I can imagine a situation in which I would not be able to change an Excel date format because, for example, that would break the code of another user who was using the same Excel file for a different purpose. However, the INC function solved that problem. Thank you!

Answers

Setup Info
    Tags
      Help me…