HOW TO VALIDATE EMAIL ADDRESS IN A COLUMN OF A FILE

Aminmin
Aminmin Dataiku DSS Core Designer, Registered Posts: 18 ✭✭✭✭

Hi Dataiku Community, i have a column in an excel file comprising of email addresses.

I would appreciate any advise on how to validate the email address to ensure it is correct, for example by ensuring that the '@' and '.' sign is at the correct place separating the name, domain name and domain.

How can i create a simple flag or pattern or use a recipe in a flow? I saw some post on using Python or Plug-In but have no brain capacity to understand those jargons

Thank you in advance for your time and kind advise.

Regards

Aminmin


Operating system used: IOS

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,212 Dataiker

    Hi,

    You can use a DSS prepare recipe with the processor: https://doc.dataiku.com/dss/latest/preparation/processors/flag-on-meaning.html

    Set the meaning to email for your column and you be to flag valid/invalid email addresses.

    Thanks,

  • Aminmin
    Aminmin Dataiku DSS Core Designer, Registered Posts: 18 ✭✭✭✭

    Hi AlexT, thank you for your reply.

    I will try it out.

    Btw, will this work if there is multiple email addresses in a cell and separated by a semi-colon.

    Thanks and have a great weekend!!

    Regards

    Aminmin

  • Jurre
    Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 115 ✭✭✭✭✭✭✭

    Hi @Aminmin
    ,

    It might be a good idea to filter out those records with multiple emailadresses, for example by splitting the dataset on occurence of a semicolon in that emailadress column. Then in the resulting multi-emaildataset split the column containing multiple emailadresses on that semicolon to get individual recognisable adresses. Just a thought, best wishes for the weekend all!

  • Aminmin
    Aminmin Dataiku DSS Core Designer, Registered Posts: 18 ✭✭✭✭

    Hi Jurre, thank you for your suggestion. I will sure to try it out.

    Regards

    Aminmin

  • Jurre
    Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 115 ✭✭✭✭✭✭✭

    Another option might be to :

    • check if the separation of emailadresses is always that semicolon (with a formula processor within the prepare recipe)
    • split and fold the column with emailadresses to get a single column with (hopefully) single values, which then can be processed further as @AlexT
      suggested.

    Just a suggestion, i'm sure you will find alternatives or variations which better suit your challenge. Would it be possible to share the one which worked best for you ? Thanx!

  • Aminmin
    Aminmin Dataiku DSS Core Designer, Registered Posts: 18 ✭✭✭✭

    Hi AlexT, i have tried your suggestion. However, dataiku seemed to validate e-mail as correct even if i made entries such as @.sc.com or @sccom. Pls see row 2 and 4.

    Can you please advise what Dataiku checks when the meaning is E-mail? What else do you suggest i can try?

    Also i have tried @Jurre
    's suggestion to split multiple email addresses based on ;

    Thank you for your attention. I look forward to your advise.

    Regards

    Aminmin

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,983 Neuron

    So your question has two questions in one really. With regards to separating multiple email addresses in separate columns you should post another question as it is a complete different issue. With regards to email validation it clearly seems that Dataiku's email meaning is not clever enough to detect incorrect email addresses like @.sc.com or @sccom. Validating email addresses can be a very complex task depending on the level of validation that you want to achieve. For instance do you want to validate the email domain exists? Does the user account exists? The py3-validate-email Python package is one of the most complete email validators out there supporting many levels of validation. But since you are unwilling to get your hands dirty to code a solution in Python your best you could achieve really is to use a regular expression:

    https://stackoverflow.com/questions/8022530/how-to-check-for-valid-email-address

Setup Info
    Tags
      Help me…