I have a site with content that is searchable using a search bar that is powered by Sphinx Search (only mentioned because I will end up using Sphinx’s geo-search functionality).
Table fields include:
Id, title, description, tags, geolocation
How can I go about determining if any part of a string contains a reference to a geographic location? The solution I am looking for will likely be performed in PHP and I will then search using Sphinx as I normally would.
For example, if someone searches for any of the following:
Car parts in California
Car parts near San Francisco
90210 car parts
Then I would like to be able to return a list of all entries that match car parts within a certain radius of the desired location.
I am open to any suggestions as to how to make this problem simpler.
Note: the geolocation substring entry by the user is optional. Therefore, the solution needs to determine it’s existence and then act accordingly.
There are a couple of APIs you could use for this:
http://www.datasciencetoolkit.org/ <– look at Geodict
http://developer.yahoo.com/geo/placemaker/guide/web-service.html
http://developers.metacarta.com/api/ <– look at Query Parser
… they perform all the “heavy lifting” for you 🙂
Alternativly, could make your own with sphinx itself!
Download a copy of geonames database http://www.geonames.org/
Stick it in a database table, and make a sphinx index on it.
Then take your query string and run a SPH_MATCH_ANY query against the ‘geo’ table.
Then look though the sphinx resultset, and extract any place matches – to make a new query without the placename.
This sphinx index will also return you geocoordinates you can use for the real query 🙂
(you could optimise it a bit to specifically notice the ‘in/near’ and either just remove them, or use them to explicitly identify the placename)
Good luck!
(the zip-code handling – could also be done in the same way – put the zip codes in the sphinx index too. there are downloadable copies available online. Or could be handled as special case – looking for a number)