The Dataiku Frontrunner Awards have just launched to recognize your achievements! Submit Your Entry

Expand valid country names for country meaning to match output of the reverse geocoding processor

Expand valid country names for country meaning to match output of the reverse geocoding processor

0 Kudos

The reverse geocoding processor is incredibly useful for geographic datasets. While it doesn't always get the answers perfect and sometimes inconsistently classifies hierarchies (and sometimes knows the city for a geopoint, but not the country), it's very powerful for quickly turning a set of coordinates into meaningful sets of geographic classifications, and it even returns local language place names - fantastic. There's just one issue: some of the countries it outputs aren't recognized as valid country names by the country meaning. Here are a few examples from an airports geo dataset I've been working with recently. To produce these, I just created a geopoint from the coordinates, then processed them with reverse-geocode. About two percent of the output from my sample is invalid, including:

  • Russian Federation
  • Fr. Polynesia
  • Congo-Kinshasa
  • The Bahamas
  • People's Republic of China
  • Kingdom of Lesotho
  • Congo-Brazzaville
  • U.S. Minor Outlying Is.
  • The Gambia
  • U.S. Virgin Is.
  • St. Pierre and Miquelon
  • Somaliland
  • Akrotiri
  • Côte d'Ivoire

These tend to be minor discrepancies in name and countries with varying levels of international recognition, but it would be nice if the country meaning accepted everything the reverse geocoder can output, or at least if there were a mapping I could apply as a second step to make the country names valid. This is a minor issue, but does make it difficult to explore geographic data when I want to confirm that reverse geocoding was successful.

If the country validator could accept these country names, it would make it easy to identify if reverse-geocoded results are valid country names, especially if they've been processed after the reverse geocoding step in any way.