Check if data is date or not

J2U_45000
J2U_45000 Partner, Registered Posts: 3 Partner

Hi,

I have a column of type varchar which contains this kind of data:

ANDCZQ
QQSF
2020-01-12
01/12/2022
04/30/2000
ZERTYR

I want to clean this column by going through a Recipe and leave the non-date columns NULL and transform date format in this format DD/MM/YYYY

In the end I should have:

ANDCZQ NULL
QQSF NULL
2020-01-12 12/01/2020
01/12/2022 01/12/2022
30/04/2000 30/04/2000
ZERTYR NULL
I can't seem to find a formula in the Prepare, which can achieve what I want.


thank you in advance

Tagged:

Best Answer

Answers

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
    edited July 17

    @J2U_45000

    Welcome to the Dataiku Community.

    I always find dates and times to be a challenge.

    There are a number of ways you could approach messy data like the data you are sharing.

    As a first approach in DSS, I might try a visual recipe. In the visual recipe I might try the step Extract with regular expression to find things that look date like. Here is a link to this powerful visual recipe step:

    https://doc.dataiku.com/dss/latest/preparation/processors/pattern-extract.html

    My regular expression might be something like

    (\d+[-/]\d+[-/]\d+)

    The newish smart pattern recognition in DSS can be helpful in creating such a regular expression.

    If you would like to learn more about writing regular expressions, you might take a look at https://regexr.com/

    Image of site regexr.com parsing date like values

    Once you have the date like values in the new column, you are likely to want to convert these values to actual dates with a date parser.

    https://doc.dataiku.com/dss/latest/preparation/processors/date-parser.html

    This page will give you more details about working with dates in DSS

    https://doc.dataiku.com/dss/latest/preparation/dates.html

    In particular you might want to look at date formatting for your final output format

    https://doc.dataiku.com/dss/latest/preparation/dates.html#reformating-dates

    If the above approach is not detailed enough to capture all of the kinds of dates you have in your column you could also go to a python library. I believe that there are several out there that might help. For example the python datefinder library. This would not be my first step because this takes a bit longer to setup in a code environment and then a Python Recipe.

    Hope this helps. Do let us in the community know how you are getting on with your project.

Setup Info
    Tags
      Help me…