Troubles with dates in different formats when importing and then trying to parse

Options
Vaslo
Vaslo Registered Posts: 5
edited July 16 in Using Dataiku

This feels like it should be a simple thing, but this is taking hours to fix this simple issue.

I have uploaded data with dates that are quarterly and the first day of the quarter: 01/01/2022 for example which would be MM/dd/yyyy I think.

As soon as I import it immediately changes the dates to look like this: 2022-01-01. I'll actually eventually want this but I'll explain my issue shortly. No amount of forcing it to "string" and "text" will make it display as original, even after refresh (why?? Shouldn't string/text show me the raw string as it was with the slashes?)

Then when I get to the parse step and want it to be parsed as Dataiku suggests (yyyy-MM-dd), Dataiku gets totally lost and I start getting the following warnings which start breaking stuff downstream:

[15:49:21] [WARN] [dku.shaker.date] - SHAKER_BAD_DATE: Failed to parse '1/1/2010' using format 'yyyy-MM-dd', at position 1
[15:49:21] [WARN] [dku.shaker.date] - SHAKER_BAD_DATE: Failed to parse '4/1/2010' using format 'yyyy-MM-dd', at position 1
[15:49:21] [WARN] [dku.shaker.date] - SHAKER_BAD_DATE: Failed to parse '7/1/2010' using format 'yyyy-MM-dd', at position 1
[15:49:21] [WARN] [dku.shaker.date] - SHAKER_BAD_DATE: Failed to parse '10/1/2010' using format 'yyyy-MM-dd', at position 2
[15:49:21] [WARN] [dku.shaker.date] - SHAKER_BAD_DATE: Failed to parse '1/1/2011' using format 'yyyy-MM-dd', at position 1

1) Why can't I make the original date a Text/String so I'm seeing what was in the original CSV with the date format 01/01/2022 or whatever. Why is it insisting on changing to the dash format?

2) Why is it failing to parse? I suspect it's because the original date is in the 1/1/2010 format but Dataiku never shows it to me that way, so this is very confusing.

Thanks.


Operating system used: Windows 10

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,727 Neuron
    Options

    1) Time series needs a date data type not a string. You can keep the date as a string in a different column for display purposes.

    2) Dataiku will suggest the date parsing format ('yyyy-MM-dd') based on what it thinks it fits the first few rows. However you can specify the correct parsing format yourself.

  • Vaslo
    Vaslo Registered Posts: 5
    Options

    Thanks for your reply and sorry for the incorrect title - something about doing it when I'm not logged in.

    Let me be a little clearer now that I've spent more time on it and your answer has helped me better understand. I have a csv that I import. In the csv it's 01/01/2024 for the date format. At some point I must have erroneously changed it to be like 2024-01-01. Now it seems stuck that way, even when I convert it to string/text in the attached screenshot.

    How would I have the original import step push it back to the original format with /'s instead of the dashes, and I can convert to dashes somehow later? I can reimport and start over but then it insists on deleting everything else down the chain.

    Thanks!

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,727 Neuron
    Options

    CSVs don't have data types, is a simple comma separated text file. When you upload a CSV file to Dataiku it will load all columns as String by default. Maybe you changed this on the Dataset => Settings => Schema tab? Either way the issue here is that you converted it to a Date and then back to a string with a different format. You will need to find where in your flow you are doing those conversions and remove them. Explore each dataset in your flow and review the data type of the date column.

Setup Info
    Tags
      Help me…