The title might be somewhat ambiguous, but bear with me (The only similar question

Question

0

Asked: May 28, 20262026-05-28T00:24:02+00:00 2026-05-28T00:24:02+00:00

The title might be somewhat ambiguous, but bear with me (The only similar question

0

The title might be somewhat ambiguous, but bear with me (The only similar question I could find was Solr: Search in multiple fields BUT STOP if documents match was found, but that did not provide any solutions). I have the following structure for my lucene documents:

FieldA (Store.YES, Index.ANALYZED), primary identification of an entity
FieldB (Store.YES, Index.ANALYZED), secondary identification(s) of an entity

FieldA could for example contain a string like car, where FieldB could contain strings like automobile, vehicle, etc. There can be multiple FieldB fields in the document. The index analyzer is a StandardAnalyzer, the search analyzer is a KeywordAnalyzer (that seemed to yield the best result, not sure if it is the best approach). The identifier in FieldA is of higher importance then the identifier(s) in FieldB.

Let’s say the index contains 3 documents (with FieldA | FieldB fields):

"car"       | "vehicle" "automobile"
"car parts" | "parts, car"
"car shop"  | "shop, car"

So far, so good. Now where the problem lies:

When querying for "car", I would like to see the following result (scores are made up):

car, score 1.0
car parts, score 0.9
car shop, score 0.9

The document with the FieldA value of “car” should show up first, since FieldA is considered more important, and the query matches that value best. In reality, the following happens:

car parts, score 0.625
car shop, score 0.625
car, score 0.5073969

searcher.explain() outputs the following: (left the explain for “car shop” out, since it is the same as “car parts”)

Explain: 0.625 = (MATCH) max of:
  0.31712303 = (MATCH) weight(fielda:car in 0), product of:
    0.71231794 = queryWeight(fielda:car), product of:
      0.71231794 = idf(docFreq=3, maxDocs=3)
      1.0 = queryNorm
    0.4451987 = (MATCH) fieldWeight(fielda:car in 0), product of:
      1.0 = tf(termFreq(fielda:car)=1)
      0.71231794 = idf(docFreq=3, maxDocs=3)
      0.625 = fieldNorm(field=fielda, doc=0)
  0.625 = (MATCH) fieldWeight(fieldb:car in 0), product of:
    1.0 = tf(termFreq(fieldb:car)=1)
    1.0 = idf(docFreq=2, maxDocs=3)
    0.625 = fieldNorm(field=fieldb, doc=0)
Explain: 0.5073969 = (MATCH) max of:
  0.5073969 = (MATCH) weight(fielda:car in 0), product of:
    0.71231794 = queryWeight(fielda:car), product of:
      0.71231794 = idf(docFreq=3, maxDocs=3)
      1.0 = queryNorm
    0.71231794 = (MATCH) fieldWeight(fielda:car in 0), product of:
      1.0 = tf(termFreq(fielda:car)=1)
      0.71231794 = idf(docFreq=3, maxDocs=3)
      1.0 = fieldNorm(field=fielda, doc=0)

TL;DR: with the two fields, boosting FieldA will not help because all 3 documents will get boosted. How to get lucene to rank the closest match (“car” in this example”) as the highest? i.e. how to stop searching in the current document after the (more important) match in FieldA is encountered?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T00:24:03+00:00

Editorial Team

2026-05-28T00:24:03+00:00Added an answer on May 28, 2026 at 12:24 am

Use NOT syntax.

a:car^2 (+b:car -a:car)

This way ones with matches in b will be ignored unless they fail to match a.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

The title might be somewhat ambiguous, but bear with me (The only similar question

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply