Postal Address Parsing

tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron


I'm working on a record linkage project. With a non-profit so little budget for third-party processing. As part of deduplication. I've done the first pass at parsing and formating email addresses, names, phone numbers in a standard way.

I'm moving on to parsing and standardizing Postal Addresses.

My questions are without paying a third party to do a USPS Coding Accuracy Support System (CASS) match what are folks doing to format their email addresses consistently.

I've seen some descriptions of using a C library libpostal both in Python and Postgres SQL server. I've seen some descriptions of using USCensus Geo Coder, there may be other geocoders. However geocoding price needs to be very low.

What are others using successfully?

Setup Info
      Help me…