I am trying to extract a US address from a text.
So if I have the following variations of text then I’d like to extract the address portion
Today is a good day to meet up at a
bar. the address is 123 fake street,
NY, 23423-3423
just came from 423 Elm Street, kk, 34223 ...had awesome time
blah blah bleh blah 23414 Fake Terrace, MM something else
experimented my teleporter to get to work but reached at 2423 terrace NY
If someone can provide some starting points then I can mold it for other variations.
At some point, you’d have clarify what you consider an address to be.
Does an address just have a street number and street name?
Does an address have a street name, and a city name?
Does an address have a city name, a state name?
Does an address have a city name, a state abbreviation, and a zip code? What format is the zip code in?
It’s easy to see how you can run into trouble quickly.
This obviously wouldn’t catch everything, but maybe you could match strings that start with a street number, has a state abbreviation in the middle somewhere, and end in a zip code. The reliability of this would greatly depend on knowing what sort of text you were using as the input. I.e., if there is a lot of other numbers in the text, this could be completely useless.
possible regex
sample input
matches
regex explanation