Want to Stop Rebuilding "Expensive" Parts of your Flow? Explicit Builds are the Answer!READ MORE

Check if data is date or not

Solved!
J2U_45000
Level 1
Level 1
Check if data is date or not

Hi,

I have a column of type varchar which contains this kind of data:

ANDCZQ
QQSF
2020-01-12
01/12/2022
04/30/2000
ZERTYR

I want to clean this column by going through a Recipe and leave the non-date columns NULL and transform date format in this format DD/MM/YYYY

In the end I should have:

ANDCZQ        NULL
QQSF             NULL
2020-01-12   12/01/2020
01/12/2022   01/12/2022
30/04/2000    30/04/2000
ZERTYR        NULL
I can't seem to find a formula in the Prepare, which can achieve what I want.


thank you in advance

0 Kudos
1 Solution
J2U_45000
Level 1
Level 1
Author

Hi , 

Thanks you so much for your reply. Finally i check in SQL recipe. 

 

View solution in original post

0 Kudos
2 Replies
tgb417
Neuron
Neuron

@J2U_45000 

Welcome to the Dataiku Community.

I always find dates and times to be a challenge. 

There are a number of ways you could approach messy data like the data you are sharing. 

As a first approach in DSS, I might try a visual recipe.  In the visual recipe I might try the step Extract with regular expression to find things that look date like.  Here is a link to this powerful visual recipe step:

https://doc.dataiku.com/dss/latest/preparation/processors/pattern-extract.html

My regular expression might be something like

(\d+[-/]\d+[-/]\d+)

The newish smart pattern recognition in DSS can be helpful in creating such a regular expression.

If you would like to learn more about writing regular expressions, you might take a look at https://regexr.com/

Image of site regexr.com parsing date like valuesImage of site regexr.com parsing date like values

Once you have the date like values in the new column, you are likely to want to convert these values to actual dates with a date parser.  

https://doc.dataiku.com/dss/latest/preparation/processors/date-parser.html

This page will give you more details about working with dates in DSS

https://doc.dataiku.com/dss/latest/preparation/dates.html

In particular you might want to look at date formatting for your final output format

https://doc.dataiku.com/dss/latest/preparation/dates.html#reformating-dates

If the above approach is not detailed enough to capture all of the kinds of dates you have in your column you could also go to a python library.  I believe that there are several out there that might help.  For example the python datefinder library.  This would not be my first step because this takes a bit longer to setup in a code environment and then a Python Recipe.

Hope this helps.  Do let us in the community know how you are getting on with your project.

 

--Tom
0 Kudos
J2U_45000
Level 1
Level 1
Author

Hi , 

Thanks you so much for your reply. Finally i check in SQL recipe. 

 

0 Kudos