Updated Data "Meaning" of Email Addresses to accept RFC 6531 addresses that allows some UTF-8

User Story:

As a data analyst that works with persons from around the world.  It is challenging when the meaning of email address in data views does not currently correctly take into account local parts of email addresses (the part before the @) that includes characters beyond ASCII.  The use of UTF-8 strings has been defined since at least 2012 and providers like gmail are allowing such strings to appear in the local part of an email address.  Fixing this will allow more accurate evaluation of email addresses in a dataset, and fewer confessions as to data quality.

Notes:

  • According to RFC 821 email addresses could only include a limited set of letters and numbers in the local part of the email address.  However, that RFC has been superseded a number of times.
    • Today RFC 6530 Overview and Framework for Internationalized Email  https://datatracker.ietf.org/doc/html/rfc6530 allows for RTF-8 in the local part of email addresses.  
    • And RFC 6531 is specifically about SMTP Extension for Internationalized Email.  See Section 3.2 that discusses the Local Part of the email address.
  • Email addresses like cesenaünlü@example.com should not be considered as errors by Dataiku.  Dataiku DSS flags such email addresses as errors at this time.
  • Google announced support of support for third-party internationalized email addresses in Gmail back in August of 2014. https://blog.google/products/gmail/a-first-step-toward-more-global-email/ 
  • Here is a bit of a discussion about creating a regex to find these email addresses correctly.  https://stackoverflow.com/questions/56612022/where-can-i-find-a-java-regular-expression-for-email-va... 
  • I receive email addresses like this periodically.
--Tom
1 Comment

P.S. I know that I could create a local definition for email address as an interim work around.  This is a request for the "standard" defined meaning in DSS to reflect these later standards.

--Tom

P.S. I know that I could create a local definition for email address as an interim work around.  This is a request for the "standard" defined meaning in DSS to reflect these later standards.