Check if data is date or not
Hi,
I have a column of type varchar which contains this kind of data:
ANDCZQ
QQSF
2020-01-12
01/12/2022
04/30/2000
ZERTYR
I want to clean this column by going through a Recipe and leave the non-date columns NULL and transform date format in this format DD/MM/YYYY
In the end I should have:
ANDCZQ NULL
QQSF NULL
2020-01-12 12/01/2020
01/12/2022 01/12/2022
30/04/2000 30/04/2000
ZERTYR NULL
I can't seem to find a formula in the Prepare, which can achieve what I want.
thank you in advance
Best Answer
-
Hi ,
Thanks you so much for your reply. Finally i check in SQL recipe.
Answers
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
Welcome to the Dataiku Community.
I always find dates and times to be a challenge.
There are a number of ways you could approach messy data like the data you are sharing.
As a first approach in DSS, I might try a visual recipe. In the visual recipe I might try the step Extract with regular expression to find things that look date like. Here is a link to this powerful visual recipe step:
https://doc.dataiku.com/dss/latest/preparation/processors/pattern-extract.html
My regular expression might be something like
(\d+[-/]\d+[-/]\d+)
The newish smart pattern recognition in DSS can be helpful in creating such a regular expression.
If you would like to learn more about writing regular expressions, you might take a look at https://regexr.com/
Once you have the date like values in the new column, you are likely to want to convert these values to actual dates with a date parser.
https://doc.dataiku.com/dss/latest/preparation/processors/date-parser.html
This page will give you more details about working with dates in DSS
https://doc.dataiku.com/dss/latest/preparation/dates.html
In particular you might want to look at date formatting for your final output format
https://doc.dataiku.com/dss/latest/preparation/dates.html#reformating-dates
If the above approach is not detailed enough to capture all of the kinds of dates you have in your column you could also go to a python library. I believe that there are several out there that might help. For example the python datefinder library. This would not be my first step because this takes a bit longer to setup in a code environment and then a Python Recipe.
Hope this helps. Do let us in the community know how you are getting on with your project.