Dates imported from Excel as days since 01 JAN 1900

Solved!
JMH829
Level 2
Dates imported from Excel as days since 01 JAN 1900

Hi, I am a new user, and I am doing a proof-of-concept trial with Dataiku 7.0.0. I am importing data containing dates from Excel.  The dates in the Excel cells are formatted to appear like this: 

2019-01-01 00:37:55 UTC

The text that is entered to be displayed in the cell looks like this:

1/1/2019 12:37:55 AM

When I try to import the data into Dataiku, it interprets that date column as text and returns 43466.02633101852. I understand that is Excel's way of storing the date as the number of days since January 1, 1900. I can go into Excel, change the format of the date column to the custom format of mm/dd/yyyy hh:mm, save a new Excel file, and import that into Dataiku, and then Dataiku recognizes the date.

Questions:

1. Is there a way to coax Dataiku to read the date-time value as a date that can then be parsed, starting from the original Excel file?

2. If not, is there a function to create a date value from the "Date" column that Dataiku interpreted as the value 43466.02633101852?

Thank you for your help.

2 Solutions
MickaelH
Dataiker

Hi,

Thanks for providing the sample file.

Indeed DSS won't interpret the custom format "2019-01-01 00:37:55 UTC" as a valid date format (we are relying on the Apache POI librairies for that). So the available options I would suggest are the following:

  • either change the custom format in excel to be the default one for date-time as you already tried: i.e. "1/1/2019 0:37:55"
  • or if your excel document is only made of a single sheet, export the corresponding sheet in CSV instead and after import in DSS parse the date string thanks to "Parse date" step in a "Prepare recipe" where you'll be able to define the custom format
  • or convert, thanks to a "Formula" step in a Prepare recipe, the number to the corresponding date. Based on Dates And Times In Excel, the formula, in case the corresponding column is entitled "Date", would be:

 

inc(inc(asDate("1900-01-01", "yyyy-MM-dd"), floor(Date)-2, "days"),floor(60*60*24*(Date-floor(Date))), "seconds")

 

I hope this helps.

Regards,

Mickaรซl | R&D at Dataiku

View solution in original post

JMH829
Level 2
Author

Thanks MIckael! For my application, it would be easy to change the format in Excel before trying to import the data. However, I can imagine a situation in which I would not be able to change an Excel date format because, for example, that would break the code of another user who was using the same Excel file for a different purpose. However, the INC function solved that problem. Thank you!

 

View solution in original post

7 Replies
MickaelH
Dataiker

Hi,

Would that be possible for you to upload a sample of the Excel file that you want to load into DSS ?

Thanks in advance.

Regards,

Mickaรซl | R&D at Dataiku
JMH829
Level 2
Author

Hi Mickael. I tried to upload an Excel file to use as an example, but this board does not allow .xlsx files to be uploaded. A .txt or .csv file would not preserve the type of formatting that originally prompted the question.

LisaB
Dataiker

Hi @JMH829 - just added the capability to upload .xlsx files if you can try again. Let us know if you encounter any other issues - thanks for posting on Community!

Lisa, Community Programs Manager at Dataiku

Don't forget to mark as "Accepted Solution" when someone provides the correct answer to your question ๐Ÿ˜‰
JMH829
Level 2
Author

Thanks, Lisa! An example Excel file is now attached to this message. Column A has the dates in UTC format as described previously.

MickaelH
Dataiker

Hi,

Thanks for providing the sample file.

Indeed DSS won't interpret the custom format "2019-01-01 00:37:55 UTC" as a valid date format (we are relying on the Apache POI librairies for that). So the available options I would suggest are the following:

  • either change the custom format in excel to be the default one for date-time as you already tried: i.e. "1/1/2019 0:37:55"
  • or if your excel document is only made of a single sheet, export the corresponding sheet in CSV instead and after import in DSS parse the date string thanks to "Parse date" step in a "Prepare recipe" where you'll be able to define the custom format
  • or convert, thanks to a "Formula" step in a Prepare recipe, the number to the corresponding date. Based on Dates And Times In Excel, the formula, in case the corresponding column is entitled "Date", would be:

 

inc(inc(asDate("1900-01-01", "yyyy-MM-dd"), floor(Date)-2, "days"),floor(60*60*24*(Date-floor(Date))), "seconds")

 

I hope this helps.

Regards,

Mickaรซl | R&D at Dataiku
JMH829
Level 2
Author

Thanks MIckael! For my application, it would be easy to change the format in Excel before trying to import the data. However, I can imagine a situation in which I would not be able to change an Excel date format because, for example, that would break the code of another user who was using the same Excel file for a different purpose. However, the INC function solved that problem. Thank you!

 

JMH829
Level 2
Author

See my response to Lisa. I uploaded an Excel file with UTC dates as an example.