HOW TO VALIDATE EMAIL ADDRESS IN A COLUMN OF A FILE
Hi Dataiku Community, i have a column in an excel file comprising of email addresses.
I would appreciate any advise on how to validate the email address to ensure it is correct, for example by ensuring that the '@' and '.' sign is at the correct place separating the name, domain name and domain.
How can i create a simple flag or pattern or use a recipe in a flow? I saw some post on using Python or Plug-In but have no brain capacity to understand those jargons
Thank you in advance for your time and kind advise.
Regards
Aminmin
Operating system used: IOS
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,215 Dataiker
Hi,
You can use a DSS prepare recipe with the processor: https://doc.dataiku.com/dss/latest/preparation/processors/flag-on-meaning.html
Set the meaning to email for your column and you be to flag valid/invalid email addresses.Thanks,
-
Hi AlexT, thank you for your reply.
I will try it out.
Btw, will this work if there is multiple email addresses in a cell and separated by a semi-colon.
Thanks and have a great weekend!!
Regards
Aminmin
-
Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 115 ✭✭✭✭✭✭✭
Hi @Aminmin
,It might be a good idea to filter out those records with multiple emailadresses, for example by splitting the dataset on occurence of a semicolon in that emailadress column. Then in the resulting multi-emaildataset split the column containing multiple emailadresses on that semicolon to get individual recognisable adresses. Just a thought, best wishes for the weekend all!
-
Hi Jurre, thank you for your suggestion. I will sure to try it out.
Regards
Aminmin
-
Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 115 ✭✭✭✭✭✭✭
Another option might be to :
- check if the separation of emailadresses is always that semicolon (with a formula processor within the prepare recipe)
- split and fold the column with emailadresses to get a single column with (hopefully) single values, which then can be processed further as @AlexT
suggested.
Just a suggestion, i'm sure you will find alternatives or variations which better suit your challenge. Would it be possible to share the one which worked best for you ? Thanx!
-
Hi AlexT, i have tried your suggestion. However, dataiku seemed to validate e-mail as correct even if i made entries such as @.sc.com or @sccom. Pls see row 2 and 4.
Can you please advise what Dataiku checks when the meaning is E-mail? What else do you suggest i can try?
Also i have tried @Jurre
's suggestion to split multiple email addresses based on ;Thank you for your attention. I look forward to your advise.
Regards
Aminmin
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,987 Neuron
So your question has two questions in one really. With regards to separating multiple email addresses in separate columns you should post another question as it is a complete different issue. With regards to email validation it clearly seems that Dataiku's email meaning is not clever enough to detect incorrect email addresses like @.sc.com or @sccom. Validating email addresses can be a very complex task depending on the level of validation that you want to achieve. For instance do you want to validate the email domain exists? Does the user account exists? The py3-validate-email Python package is one of the most complete email validators out there supporting many levels of validation. But since you are unwilling to get your hands dirty to code a solution in Python your best you could achieve really is to use a regular expression:
https://stackoverflow.com/questions/8022530/how-to-check-for-valid-email-address