Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on March 24, 2022 1:06PM
Likes: 0
Replies: 2
Hi,
I have a column of type varchar which contains this kind of data:
ANDCZQ
QQSF
2020-01-12
01/12/2022
04/30/2000
ZERTYR
I want to clean this column by going through a Recipe and leave the non-date columns NULL and transform date format in this format DD/MM/YYYY
In the end I should have:
ANDCZQ NULL
QQSF NULL
2020-01-12 12/01/2020
01/12/2022 01/12/2022
30/04/2000 30/04/2000
ZERTYR NULL
I can't seem to find a formula in the Prepare, which can achieve what I want.
thank you in advance
Hi ,
Thanks you so much for your reply. Finally i check in SQL recipe.
Welcome to the Dataiku Community.
I always find dates and times to be a challenge.
There are a number of ways you could approach messy data like the data you are sharing.
As a first approach in DSS, I might try a visual recipe. In the visual recipe I might try the step Extract with regular expression to find things that look date like. Here is a link to this powerful visual recipe step:
https://doc.dataiku.com/dss/latest/preparation/processors/pattern-extract.html
My regular expression might be something like
(\d+[-/]\d+[-/]\d+)
The newish smart pattern recognition in DSS can be helpful in creating such a regular expression.
If you would like to learn more about writing regular expressions, you might take a look at https://regexr.com/
Once you have the date like values in the new column, you are likely to want to convert these values to actual dates with a date parser.
https://doc.dataiku.com/dss/latest/preparation/processors/date-parser.html
This page will give you more details about working with dates in DSS
https://doc.dataiku.com/dss/latest/preparation/dates.html
In particular you might want to look at date formatting for your final output format
https://doc.dataiku.com/dss/latest/preparation/dates.html#reformating-dates
If the above approach is not detailed enough to capture all of the kinds of dates you have in your column you could also go to a python library. I believe that there are several out there that might help. For example the python datefinder library. This would not be my first step because this takes a bit longer to setup in a code environment and then a Python Recipe.
Hope this helps. Do let us in the community know how you are getting on with your project.