Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

Dates imported from Excel as days since 01 JAN 1900

Solved!
JMH829
Level 2
Dates imported from Excel as days since 01 JAN 1900

Hi, I am a new user, and I am doing a proof-of-concept trial with Dataiku 7.0.0. I am importing data containing dates from Excel.  The dates in the Excel cells are formatted to appear like this: 

2019-01-01 00:37:55 UTC

The text that is entered to be displayed in the cell looks like this:

1/1/2019 12:37:55 AM

When I try to import the data into Dataiku, it interprets that date column as text and returns 43466.02633101852. I understand that is Excel's way of storing the date as the number of days since January 1, 1900. I can go into Excel, change the format of the date column to the custom format of mm/dd/yyyy hh:mm, save a new Excel file, and import that into Dataiku, and then Dataiku recognizes the date.

Questions:

1. Is there a way to coax Dataiku to read the date-time value as a date that can then be parsed, starting from the original Excel file?

2. If not, is there a function to create a date value from the "Date" column that Dataiku interpreted as the value 43466.02633101852?

Thank you for your help.

2 Solutions
MickaelH
Dataiker
Dataiker

Hi,

Thanks for providing the sample file.

Indeed DSS won't interpret the custom format "2019-01-01 00:37:55 UTC" as a valid date format (we are relying on the Apache POI librairies for that). So the available options I would suggest are the following:

  • either change the custom format in excel to be the default one for date-time as you already tried: i.e. "1/1/2019 0:37:55"
  • or if your excel document is only made of a single sheet, export the corresponding sheet in CSV instead and after import in DSS parse the date string thanks to "Parse date" step in a "Prepare recipe" where you'll be able to define the custom format
  • or convert, thanks to a "Formula" step in a Prepare recipe, the number to the corresponding date. Based on Dates And Times In Excel, the formula, in case the corresponding column is entitled "Date", would be:

 

inc(inc(asDate("1900-01-01", "yyyy-MM-dd"), floor(Date)-2, "days"),floor(60*60*24*(Date-floor(Date))), "seconds")

 

I hope this helps.

Regards,

Mickaël | R&D at Dataiku

View solution in original post

JMH829
Level 2
Author

Thanks MIckael! For my application, it would be easy to change the format in Excel before trying to import the data. However, I can imagine a situation in which I would not be able to change an Excel date format because, for example, that would break the code of another user who was using the same Excel file for a different purpose. However, the INC function solved that problem. Thank you!

 

View solution in original post

7 Replies
MickaelH
Dataiker
Dataiker

Hi,

Would that be possible for you to upload a sample of the Excel file that you want to load into DSS ?

Thanks in advance.

Regards,

Mickaël | R&D at Dataiku
JMH829
Level 2
Author

Hi Mickael. I tried to upload an Excel file to use as an example, but this board does not allow .xlsx files to be uploaded. A .txt or .csv file would not preserve the type of formatting that originally prompted the question.

LisaB
Community Manager
Community Manager

Hi @JMH829 - just added the capability to upload .xlsx files if you can try again. Let us know if you encounter any other issues - thanks for posting on Community!

Lisa, Community Programs Manager at Dataiku

Don't forget to mark as "Accepted Solution" when someone provides the correct answer to your question 😉
JMH829
Level 2
Author

Thanks, Lisa! An example Excel file is now attached to this message. Column A has the dates in UTC format as described previously.

MickaelH
Dataiker
Dataiker

Hi,

Thanks for providing the sample file.

Indeed DSS won't interpret the custom format "2019-01-01 00:37:55 UTC" as a valid date format (we are relying on the Apache POI librairies for that). So the available options I would suggest are the following:

  • either change the custom format in excel to be the default one for date-time as you already tried: i.e. "1/1/2019 0:37:55"
  • or if your excel document is only made of a single sheet, export the corresponding sheet in CSV instead and after import in DSS parse the date string thanks to "Parse date" step in a "Prepare recipe" where you'll be able to define the custom format
  • or convert, thanks to a "Formula" step in a Prepare recipe, the number to the corresponding date. Based on Dates And Times In Excel, the formula, in case the corresponding column is entitled "Date", would be:

 

inc(inc(asDate("1900-01-01", "yyyy-MM-dd"), floor(Date)-2, "days"),floor(60*60*24*(Date-floor(Date))), "seconds")

 

I hope this helps.

Regards,

Mickaël | R&D at Dataiku
JMH829
Level 2
Author

Thanks MIckael! For my application, it would be easy to change the format in Excel before trying to import the data. However, I can imagine a situation in which I would not be able to change an Excel date format because, for example, that would break the code of another user who was using the same Excel file for a different purpose. However, the INC function solved that problem. Thank you!

 

JMH829
Level 2
Author

See my response to Lisa. I uploaded an Excel file with UTC dates as an example.