I have a Lucene Index containing documents like these :
_id | Name | Alternate Names | Population
123 Bosc de Planavilla (some names here in 5000
345 Planavilla other languages) 20000
456 Bosc de la Planassa 1000
567 Bosc de Plana en Blanca 100000
What’s the best Lucene query type I should use and how should I structure it considering I need the following :
-
If a user queries for :
“Italian Restaurant near Bosc de Planavilla”
I want document with id 123 to be returned because its contains an exact match with the name of the doc. -
If a user queries for :
“Italian Restaurant near Planavilla”
I want document with id 345 because query contains an exact match AND it has the highest population. -
If a user queries for “Italian Restaurant near Bosc”
I want 567 because query contains “Bosc” AND of the 3 “Bosc” it has the highest pop.
there are probably many other use cases … but you get the feeling of what i need …
What kind of query will do this form me ?
Should I generate word N grams (shingles) and create an ORed boolean query using the shingles then apply custom scoring ? or will a regular phrase query will do ? I also saw DisjunctionMaxQuery but dont know if its what im looking for …
The idea, as you might have anderstood by now, is to find the exact Location a user implied in his query. From that I can start my Geo search and add some further querying around that.
What’s the best approach ?
Thanks in advance .
Here is the code for sorting as well. Although I think it would make more sense to add a custom scoring taking into account the city size rather than bruteforcing the sort on the population. Also please note that this uses the FieldCache, which may not be the best solution regarding memory usage.
This gives the following results: