Wrong date format when reading Excel file
Hi,
I have some issue when I'm reading Excel files stored in folder.
I import 1 file by month in my folder and I create a dataset to read all files and obtain one stacked table as result. For some of them (but not all), the format is not read like it's stored :
- In my Excel file, the column is store with dd/MM/yyyy format
- When I read it with Dataiku, sometimes, it's MM/dd/yyyy format stored
So, at the end, I have mixed of format in my column and I can't use parsing on prepare recipe.
Any idea to correct this ? (I would like avoid to modify manually Excel files)
Thanks in advance
Best Answer
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,993 Neuron
The key in your post is "sometimes". This is a strong indication that your problem is on your Excel file. I am afraid that you really need to go back to your Excel file and fix your data. Most likely you have a date column which has a combination of dates stored as dates and dates stored as text. Use the techniques on the following link to fix your spreadsheet:
https://www.macroption.com/excel-reverse-date/
"I would like avoid to modify manually Excel files" => Then don't use Excel files, use CSV files which don't introduce a mixed format issue.
Answers
-
CGB Partner, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 17 Partner
Hi @Turribeach
,Thanks for your answer !
You're right, my problem is due to format storage in some Excel files : sometimes as date and sometimes as text.
I use technique described on the link you suggest me and it works well.
I made some tests on distinct storage, so I complete your answer if it could helps for other people :
- if I choose to store as date with "DMY" year format, Dataiku always changes that on "MDY" format when I'm reading files and I can use option "Parse dates to ISO" to obtain directly a new column with Dataiku Date format.
- if I choose to store as text, Dataiky keep my "DMY" format and it's read as string on Dataiku. To convert on Dataiku Date format, I need to use Prepare recipe ou code
The first solution is the best one to avoid a second step executed in DSS Engine (parse Date isn't compatible with SQL)
Regarding CSV file, I'm totally agreed with you : it's the best solution to avoid this issue. Unfortunately, these files are created by business team for their uses and not for mine ...
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,993 Neuron
Thanks for the update, I am sure this is a very common issue with Excel. To be honest I wouldn't blame the users, in this day and age for Microsoft to not fix this automatically when the user pastes/inputs the data is almost criminal. But they still charge everyone their 365 subscription no matter what...