Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I have a column of type varchar which contains this kind of data:
I want to clean this column by going through a Recipe and leave the non-date columns NULL and transform date format in this format DD/MM/YYYY
In the end I should have:
I can't seem to find a formula in the Prepare, which can achieve what I want.
thank you in advance
Welcome to the Dataiku Community.
I always find dates and times to be a challenge.
There are a number of ways you could approach messy data like the data you are sharing.
As a first approach in DSS, I might try a visual recipe. In the visual recipe I might try the step Extract with regular expression to find things that look date like. Here is a link to this powerful visual recipe step:
My regular expression might be something like
The newish smart pattern recognition in DSS can be helpful in creating such a regular expression.
If you would like to learn more about writing regular expressions, you might take a look at https://regexr.com/
Once you have the date like values in the new column, you are likely to want to convert these values to actual dates with a date parser.
This page will give you more details about working with dates in DSS
In particular you might want to look at date formatting for your final output format
If the above approach is not detailed enough to capture all of the kinds of dates you have in your column you could also go to a python library. I believe that there are several out there that might help. For example the python datefinder library. This would not be my first step because this takes a bit longer to setup in a code environment and then a Python Recipe.
Hope this helps. Do let us in the community know how you are getting on with your project.