I am looking for any libraries in java that can parse an address out of a normal String of text. The text could contain all types of special and non-special 🙁 characters but all I really want to pull out of the original string is a rough address string.
In other words, how would I pull an address out of a random String containing an address in it somehwere? The format doesn’t really matter much, as long as the output has got the street and number in it somewhere. Would you use regular expressions for this if there aren’t any libraries?
I don’t know of any libraries that do this… but, this sounds like an excellent artificial intelligence problem 🙂
If you have any existing address books in ASCII/Unicode form, you could potentially use them to generate regex patterns, then run all known address regex patterns against your random text and see what comes out. This way you could kind of “teach” your algorithm how to behave based on known address formats. I suspect if any libraries do exist for this sort of thing, this is probably how they’d do it, because there are probably a TON of different ways to format a street address.
One example could be in the typical US street address. For instance:
You could write a regular expression that looks for two numbers and a state abbreviation in-between. Of course, this would only work for US street addresses, it wouldn’t catch them all, and you’d have to be careful to constrain your regex to avoid false positives, but you could add that regular expression to your list of possibilities.