I have been browsing some social network, and found there an ability to search person by: name, age range, city, country and gender.
The interesting thing is all this info may be inserted into one textbox separated by space. Then the search engine some how parses it in very accurate way and return a result list.
On one hand it seems pretty simple: split query by space and search all relevant tables for occurrence. So far so good.
However
- There are cities which names are more than 2 words and user may enter them differently as it is free text.
- There are names that are more than 2 words
Question:
How can we split the query in such way, that we certainly know which
part of it should be searched where? i.e. name in user table, city
in cities table, country in countries etc.?
What i have done so far is:
- fill users datasource with all the users
- Check if Country from Countries tableexist in the query
- if exist then filter datasource to have users from that country only
- Check if from Cities table exist in the query
- if exist then filter datasource to have users from that city only
and so on for each table, while each time we find a match in the table- we remove the found part from the query, leaving us with the most free parameter: the name.
This seems to work if user would have known exactly how the cities/ countries etc. are written in my db,
but the reality is that user may enter a part of the city or mistype the city.
I don’t know if i am in the right direction at all with what i have done. Is just a point of start…
PS: I just need an algorithm flow, so programming language doesn’t really meters. Any Idea or guidance is more than welcome.
Thanks
These kind of queries is not good for
relational databases. If it is not a must, you may think to use Lucene.Net(c#) or Lucene(java)