Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi Dataiku Community, i have a column in an excel file comprising of email addresses.
I would appreciate any advise on how to validate the email address to ensure it is correct, for example by ensuring that the '@' and '.' sign is at the correct place separating the name, domain name and domain.
How can i create a simple flag or pattern or use a recipe in a flow? I saw some post on using Python or Plug-In but have no brain capacity to understand those jargons 😅
Thank you in advance for your time and kind advise.
Operating system used: IOS
You can use a DSS prepare recipe with the processor: https://doc.dataiku.com/dss/latest/preparation/processors/flag-on-meaning.html
Set the meaning to email for your column and you be to flag valid/invalid email addresses.
Hi AlexT, i have tried your suggestion. However, dataiku seemed to validate e-mail as correct even if i made entries such as @.sc.com or @sccom. Pls see row 2 and 4.
Can you please advise what Dataiku checks when the meaning is E-mail? What else do you suggest i can try?
Also i have tried @Jurre 's suggestion to split multiple email addresses based on ;
Thank you for your attention. I look forward to your advise.
Hi AlexT, thank you for your reply.
I will try it out.
Btw, will this work if there is multiple email addresses in a cell and separated by a semi-colon.
Thanks and have a great weekend!!
Hi @Aminmin ,
It might be a good idea to filter out those records with multiple emailadresses, for example by splitting the dataset on occurence of a semicolon in that emailadress column. Then in the resulting multi-emaildataset split the column containing multiple emailadresses on that semicolon to get individual recognisable adresses. Just a thought, best wishes for the weekend all!
Another option might be to :
Just a suggestion, i'm sure you will find alternatives or variations which better suit your challenge. Would it be possible to share the one which worked best for you ? Thanx!
So your question has two questions in one really. With regards to separating multiple email addresses in separate columns you should post another question as it is a complete different issue. With regards to email validation it clearly seems that Dataiku's email meaning is not clever enough to detect incorrect email addresses like @.sc.com or @sccom. Validating email addresses can be a very complex task depending on the level of validation that you want to achieve. For instance do you want to validate the email domain exists? Does the user account exists? The py3-validate-email Python package is one of the most complete email validators out there supporting many levels of validation. But since you are unwilling to get your hands dirty to code a solution in Python your best you could achieve really is to use a regular expression: