The title might be somewhat ambiguous, but bear with me (The only similar question I could find was Solr: Search in multiple fields BUT STOP if documents match was found, but that did not provide any solutions). I have the following structure for my lucene documents:
FieldA (Store.YES, Index.ANALYZED), primary identification of an entity
FieldB (Store.YES, Index.ANALYZED), secondary identification(s) of an entity
FieldA could for example contain a string like car, where FieldB could contain strings like automobile, vehicle, etc. There can be multiple FieldB fields in the document. The index analyzer is a StandardAnalyzer, the search analyzer is a KeywordAnalyzer (that seemed to yield the best result, not sure if it is the best approach). The identifier in FieldA is of higher importance then the identifier(s) in FieldB.
Let’s say the index contains 3 documents (with FieldA | FieldB fields):
"car" | "vehicle" "automobile"
"car parts" | "parts, car"
"car shop" | "shop, car"
So far, so good. Now where the problem lies:
When querying for "car", I would like to see the following result (scores are made up):
car, score 1.0
car parts, score 0.9
car shop, score 0.9
The document with the FieldA value of “car” should show up first, since FieldA is considered more important, and the query matches that value best. In reality, the following happens:
car parts, score 0.625
car shop, score 0.625
car, score 0.5073969
searcher.explain() outputs the following: (left the explain for “car shop” out, since it is the same as “car parts”)
Explain: 0.625 = (MATCH) max of:
0.31712303 = (MATCH) weight(fielda:car in 0), product of:
0.71231794 = queryWeight(fielda:car), product of:
0.71231794 = idf(docFreq=3, maxDocs=3)
1.0 = queryNorm
0.4451987 = (MATCH) fieldWeight(fielda:car in 0), product of:
1.0 = tf(termFreq(fielda:car)=1)
0.71231794 = idf(docFreq=3, maxDocs=3)
0.625 = fieldNorm(field=fielda, doc=0)
0.625 = (MATCH) fieldWeight(fieldb:car in 0), product of:
1.0 = tf(termFreq(fieldb:car)=1)
1.0 = idf(docFreq=2, maxDocs=3)
0.625 = fieldNorm(field=fieldb, doc=0)
Explain: 0.5073969 = (MATCH) max of:
0.5073969 = (MATCH) weight(fielda:car in 0), product of:
0.71231794 = queryWeight(fielda:car), product of:
0.71231794 = idf(docFreq=3, maxDocs=3)
1.0 = queryNorm
0.71231794 = (MATCH) fieldWeight(fielda:car in 0), product of:
1.0 = tf(termFreq(fielda:car)=1)
0.71231794 = idf(docFreq=3, maxDocs=3)
1.0 = fieldNorm(field=fielda, doc=0)
TL;DR: with the two fields, boosting FieldA will not help because all 3 documents will get boosted. How to get lucene to rank the closest match (“car” in this example”) as the highest? i.e. how to stop searching in the current document after the (more important) match in FieldA is encountered?
Use NOT syntax.
a:car^2 (+b:car -a:car)This way ones with matches in b will be ignored unless they fail to match a.