Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi,
I have a column of type varchar which contains this kind of data:
ANDCZQ
QQSF
2020-01-12
01/12/2022
04/30/2000
ZERTYR
I want to clean this column by going through a Recipe and leave the non-date columns NULL and transform date format in this format DD/MM/YYYY
In the end I should have:
ANDCZQ NULL
QQSF NULL
2020-01-12 12/01/2020
01/12/2022 01/12/2022
30/04/2000 30/04/2000
ZERTYR NULL
I can't seem to find a formula in the Prepare, which can achieve what I want.
thank you in advance
Hi ,
Thanks you so much for your reply. Finally i check in SQL recipe.
Welcome to the Dataiku Community.
I always find dates and times to be a challenge.
There are a number of ways you could approach messy data like the data you are sharing.
As a first approach in DSS, I might try a visual recipe. In the visual recipe I might try the step Extract with regular expression to find things that look date like. Here is a link to this powerful visual recipe step:
https://doc.dataiku.com/dss/latest/preparation/processors/pattern-extract.html
My regular expression might be something like
(\d+[-/]\d+[-/]\d+)
The newish smart pattern recognition in DSS can be helpful in creating such a regular expression.
If you would like to learn more about writing regular expressions, you might take a look at https://regexr.com/
Image of site regexr.com parsing date like values
Once you have the date like values in the new column, you are likely to want to convert these values to actual dates with a date parser.
https://doc.dataiku.com/dss/latest/preparation/processors/date-parser.html
This page will give you more details about working with dates in DSS
https://doc.dataiku.com/dss/latest/preparation/dates.html
In particular you might want to look at date formatting for your final output format
https://doc.dataiku.com/dss/latest/preparation/dates.html#reformating-dates
If the above approach is not detailed enough to capture all of the kinds of dates you have in your column you could also go to a python library. I believe that there are several out there that might help. For example the python datefinder library. This would not be my first step because this takes a bit longer to setup in a code environment and then a Python Recipe.
Hope this helps. Do let us in the community know how you are getting on with your project.
Hi ,
Thanks you so much for your reply. Finally i check in SQL recipe.