I'm working on a record linkage project. With a non-profit so little budget for third-party processing. As part of deduplication. I've done the first pass at parsing and formating email addresses, names, phone numbers in a standard way.
I'm moving on to parsing and standardizing Postal Addresses.
My questions are without paying a third party to do a USPS Coding Accuracy Support System (CASS) match what are folks doing to format their email addresses consistently.
I've seen some descriptions of using a C library libpostal both in Python and Postgres SQL server. I've seen some descriptions of using USCensus Geo Coder, there may be other geocoders. However geocoding price needs to be very low.
What are others using successfully?