Enhanced Use of Standards for data Scheme Meaning

0 Kudos

User Story:

As a data analyst working with Dataiku "meanings" I would like to be able to understand how the choices around meanings have been set by the Dataiku team.  I've not been able to find any documentation around what standards are being used by Dataiku.  Knowing this would help me build a more consistent data practice and make these decision more clear and acceptable to my organization.

COS:

  • The use of existing standards should be a default behavior or Dataiku DSS for built in meanings
  • If meanings do not have a current International standard a clear description of the standards Dataiku has taken in setting up the meaning.
  • Documentation of what standards are in use, or have been established by Dataiku.
  • As Dataiku chooses to move to more closely follow international standards, during a transitional period the ability to use the legacy Dataiku meanings should be easily available.
  • All meanings are defined in the documentation and a comment like "DSS recognizes a few other specific meanings" should be removed from the documentation.
  • The Dataiku team makes a consistant attempt to keep up with international standards for the default system meanings.

Nice to Have:

  • The ability to extend the build in standards on an instance basis for example:
    • the use of the XK country code allowed for by ISO 3166 for Kosovo which still is under dispute by some countries around the world.
    • If used of internationalization for email addresses is not going to be supported by default by Dataiku DSS the ability to update that meaning.

Notes:

  • This Dataiku DSS documentation does not give any clear description of the standards being used in the system  https://doc.dataiku.com/dss/latest/schemas/meanings-list.html?highlight=meaning
  • The standard used for Dataiku email address meaning seems to follow RFC 2821 fairly well.  But does not seem to follow the standard for email address internationalization as described in RFC 653065316532 and 6533. 
  • The meaning country does not seem to follow ISO 3166 it seems to have it own logic about what countries are included and which ones are not included.  For example Åland does not register as a country.
  • The meaning gender might follow ISO/IEC 5218:2022 as a standard.

 

--Tom