Updated Data "Meaning" of Email Addresses to accept RFC 6531 addresses that allows some UTF-8
tgb417
Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
User Story:
As a data analyst that works with persons from around the world. It is challenging when the meaning of email address in data views does not currently correctly take into account local parts of email addresses (the part before the @) that includes characters beyond ASCII. The use of UTF-8 strings has been defined since at least 2012 and providers like gmail are allowing such strings to appear in the local part of an email address. Fixing this will allow more accurate evaluation of email addresses in a dataset, and fewer confessions as to data quality.
Notes:
- According to RFC 821 email addresses could only include a limited set of letters and numbers in the local part of the email address. However, that RFC has been superseded a number of times.
- Today RFC 6530 Overview and Framework for Internationalized Email https://datatracker.ietf.org/doc/html/rfc6530 allows for RTF-8 in the local part of email addresses.
- And RFC 6531 is specifically about SMTP Extension for Internationalized Email. See Section 3.2 that discusses the Local Part of the email address.
- Email addresses like cesenaünlü@example.com should not be considered as errors by Dataiku. Dataiku DSS flags such email addresses as errors at this time.
- Google announced support of support for third-party internationalized email addresses in Gmail back in August of 2014. https://blog.google/products/gmail/a-first-step-toward-more-global-email/
- Here is a bit of a discussion about creating a regex to find these email addresses correctly. https://stackoverflow.com/questions/56612022/where-can-i-find-a-java-regular-expression-for-email-validation-of-foreign-chara
- I receive email addresses like this periodically.
Comments
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
P.S. I know that I could create a local definition for email address as an interim work around. This is a request for the "standard" defined meaning in DSS to reflect these later standards.