Wrong date format when reading Excel file

Solved!
CGB
Level 3
Wrong date format when reading Excel file

Hi,

I have some issue when I'm reading Excel files stored in folder.

I import 1 file by month in my folder and I create a dataset to read all files and obtain one stacked table as result. For some of them (but not all), the format is not read like it's stored :

  • In my Excel file, the column is store with dd/MM/yyyy format
  • When I read it with Dataiku, sometimes, it's MM/dd/yyyy format stored

So, at the end, I have mixed of format in my column and I can't use parsing on prepare recipe.

Any idea to correct this ? (I would like avoid to modify manually Excel files)

Thanks in advance

 

0 Kudos
1 Solution
Turribeach

The key in your post is "sometimes". This is a strong indication that your problem is on your Excel file. I am afraid that you really need to go back to your Excel file and fix your data. Most likely you have a date column which has a combination of dates stored as dates and dates stored as text. Use the techniques on the following link to fix your spreadsheet:

https://www.macroption.com/excel-reverse-date/

"I would like avoid to modify manually Excel files" => Then don't use Excel files, use CSV files which don't introduce a mixed format issue. 

View solution in original post

0 Kudos
3 Replies
Turribeach

The key in your post is "sometimes". This is a strong indication that your problem is on your Excel file. I am afraid that you really need to go back to your Excel file and fix your data. Most likely you have a date column which has a combination of dates stored as dates and dates stored as text. Use the techniques on the following link to fix your spreadsheet:

https://www.macroption.com/excel-reverse-date/

"I would like avoid to modify manually Excel files" => Then don't use Excel files, use CSV files which don't introduce a mixed format issue. 

0 Kudos
CGB
Level 3
Author

Hi @Turribeach,

Thanks for your answer !

You're right, my problem is due to format storage in some Excel files : sometimes as date and sometimes as text.

I use technique described on the link you suggest me and it works well.

I made some tests on distinct storage, so I complete your answer if it could helps for other people :

  • if I choose to store as date with "DMY" year format, Dataiku always changes that on "MDY" format when I'm reading files and I can use option "Parse dates to ISO" to obtain directly a new column with Dataiku Date format.
  • if I choose to store as text, Dataiky keep my "DMY" format and it's read as string on Dataiku. To convert on Dataiku Date format, I need to use Prepare recipe ou code

The first solution is the best one to avoid a second step executed in DSS Engine (parse Date isn't compatible with SQL)

 

Regarding CSV file, I'm totally agreed with you : it's the best solution to avoid this issue. Unfortunately, these files are created by business team for their uses and not for mine ...

0 Kudos
Turribeach

Thanks for the update, I am sure this is a very common issue with Excel. To be honest I wouldn't blame the users, in this day and age for Microsoft to not fix this automatically when the user pastes/inputs the data is almost criminal. But they still charge everyone their 365 subscription no matter what...

0 Kudos