I have a number of fields that either only ever contain one term or I don’t want them to be disavantaged if they do have a greater number of terms, and I never boost the field so I disable norms for these fields with Field.Index.ANALYZED_NO_NORM or Field.Index.NOT_ANALYZED_NO_NORM.
But now if I’m searching using two fields
i.e
fielda:term1 OR fieldb:term2
and fielda has norms enabled and fieldb doesn’t, doesn’t that mean that documents that match fieldb are more likely to score better than documents that match fielda because the score for a document matching just fielda will end up with a lower score in the
weight = tf * idf * fieldnorm calculation.
because fieldnorm will be less than one if that field contains more than one term
Thats not what I wanted, I just wanted a document matching on fieldb which contained three terms to score as well as a match on a document fieldb with one term
Have I understood this right, all the discussion about fieldnorm focus on the fact that it takes up memory and it is not neccessary if your field only contains one term I’ve read no discussion of how it effects the results because of the apparent advantage a field with norms disabled has over a field with norms.
My recommandation would be not to mix queries on fields whose norms are disabled with queries on standard fields. The point of disabling norms is to save space when a query is only used as a filter (and not contributing to the score).
The elegant way of doing what you want would be to have two different similarities for your fields. However, this feature (per-field similarity) is only available in the development version currently.