Wrong date format when reading Excel file

CGB
CGB Partner, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 17 Partner

Hi,

I have some issue when I'm reading Excel files stored in folder.

I import 1 file by month in my folder and I create a dataset to read all files and obtain one stacked table as result. For some of them (but not all), the format is not read like it's stored :

  • In my Excel file, the column is store with dd/MM/yyyy format
  • When I read it with Dataiku, sometimes, it's MM/dd/yyyy format stored

So, at the end, I have mixed of format in my column and I can't use parsing on prepare recipe.

Any idea to correct this ? (I would like avoid to modify manually Excel files)

Thanks in advance

Best Answer

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,993 Neuron
    Answer ✓

    The key in your post is "sometimes". This is a strong indication that your problem is on your Excel file. I am afraid that you really need to go back to your Excel file and fix your data. Most likely you have a date column which has a combination of dates stored as dates and dates stored as text. Use the techniques on the following link to fix your spreadsheet:

    https://www.macroption.com/excel-reverse-date/

    "I would like avoid to modify manually Excel files" => Then don't use Excel files, use CSV files which don't introduce a mixed format issue.

Answers

  • CGB
    CGB Partner, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 17 Partner

    Hi @Turribeach
    ,

    Thanks for your answer !

    You're right, my problem is due to format storage in some Excel files : sometimes as date and sometimes as text.

    I use technique described on the link you suggest me and it works well.

    I made some tests on distinct storage, so I complete your answer if it could helps for other people :

    • if I choose to store as date with "DMY" year format, Dataiku always changes that on "MDY" format when I'm reading files and I can use option "Parse dates to ISO" to obtain directly a new column with Dataiku Date format.
    • if I choose to store as text, Dataiky keep my "DMY" format and it's read as string on Dataiku. To convert on Dataiku Date format, I need to use Prepare recipe ou code

    The first solution is the best one to avoid a second step executed in DSS Engine (parse Date isn't compatible with SQL)

    Regarding CSV file, I'm totally agreed with you : it's the best solution to avoid this issue. Unfortunately, these files are created by business team for their uses and not for mine ...

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,993 Neuron

    Thanks for the update, I am sure this is a very common issue with Excel. To be honest I wouldn't blame the users, in this day and age for Microsoft to not fix this automatically when the user pastes/inputs the data is almost criminal. But they still charge everyone their 365 subscription no matter what...

Setup Info
    Tags
      Help me…