i want to write a parser which search some city names or other geographic topics in a large count of textes. For example Sydney,Tower Bridge,Munich…
My idea is to request some words to a local database with informations about geography(such http://www.geonames.org/, there i can download some cityinformations) if there is a hit, the database response some lan and lon coordinats. The words must start with upper case letter and must be a length>2
But i think that the performance is very worse. One text contains 10 to 100 words.
Is there a better method to find geoinformations about a text?
And maybe there is a better database with more geoinformations?
Greetings,
destiny
You might want to index the text files using a library such as Lucene and then search for each of Cities in the list. The results would give you the file name and the location of the term (with surrounding text snippet)