I’m going to bounty +100 this question when possible, even if it’s already answered and accepted
I’m using Lucene 3.2, here’s what I have in my index and code:
- More than 10 fields per each indexed document.
ORoperand in query phrase (ie: “my lucene search” goes “my OR lucene OR search”).MultiFieldQueryParserwithOccur.SHOULDin all fields.- An specific default field containing all other fields (as proposed in this solution How to do a Multi field – Phrase search in Lucene?).
What am I trying to reach? A sort of Google-like search, let me explain:
- Search in all fields
- Scored results (with boost for specific fields, etc.)
- Adding words to the query phrase should filter results
I’m reaching every aspect but this last one. My problems are the following:
- If I search only in the default field containing all other fields, I don’t get well-scored results
- Searching only with AND operand I get way too filtered results, only getting the ones that have the whole query phrase in one field.
- Searching only with OR operand works perfect with just one word in the query, but when adding more words to the query phrase, results increase significantly instead of getting filtered (just like Google does).
- I don’t know how to filter one query from another
This is my actual call to the query parser:
MultiFieldQueryParser.parse(
Version.LUCENE_31,
OrQueryWords, //query words separated with OR operand
searchFields, //String[] searchFields; // all fields
occurs, //Occur[] occurs; {Occur.SHOULD, Occur.SHOULD, etc..}
getFullTextSession().getSearchFactory().getAnalyzer(Product.class)
);
The toString() of this query prints something like this:
(field1:"word1 word2" (field1:word1 field1:word2)) (field2:"word1 word2" (...)) etc.
Right now I’m trying to add the default field (the one containing all other fields) with query words separated with AND operand and Occur.MUST:
MultiFieldQueryParser.parse(
Version.LUCENE_31,
AndQueryWords, //query words separated with AND operand
new String[] {"defaultField"},
new Occur[] {Occur.MUST},
getFullTextSession().getSearchFactory().getAnalyzer(Product.class)
);
The toString() of this query prints this:
+(default:"word1 word2" (+default:word1 +default:word2))
How can I intersect both queries? Is there any other solution to reach it?
I am not sure to understand what you exactly want to achieve, so I am going to give you a few hints on how to customize your scoring when dealing with multi-field multi-term queries.
Intersection of two queries
You seem to be happy with you conjuctive query on the default field resultset, and by your disjunctive query on all fields scoring. You can get the best of both worlds by using the latter as your main query and the former as a filter.
For example:
Minimum should match clauses
If AND-ing all clauses is too restrictive, and OR-ing all clauses is not restrictive enough, then you could do something in between by setting the minimum number of SHOULD clauses that must match so that a document appears in the resultset.
Then the difficult part is to find the right formula to compute the minimum number of SHOULD clauses which must match for optimal user experience.
For example, let’s say you want the ceil of 3/4 of the SHOULD clauses to match. Starting with a two-clauses query and adding clauses up to 5 clauses would yield the following evolution of the number of results.
Anyway, with this feature, the only way for the number of results to shrink as the number of clauses increases is to have a purely conjunctive query.