Base Match Query: Billy Sue
Test Match Query #1: Billy Sue and
Test Match Query #2: Billy and Sue
We end up with identical scores between Base and #1, but Base and #2 have similar yet different scores.
Using the analyze API, the stop word and is removed on both test queries, but the start_offset and end_offset token properties differ for Sue between the Base query and Test Query #2.
Essentially, the pre-stop-word-removal distance between the remaining tokens is recorded and has a small yet finite impact on scoring.
The Question
Is there a way to delay the calculation of the start_offset and end_offset properties of tokens until after stop-words are removed, or otherwise prevent removed stop-words from influencing scoring in any fashion?
Perhaps disable position increments on the stop word filterand see if that helps? Especially if your mapping has some kind of filter after the stop word filter, you’ll get strange artifacts from the position increments
E.g. something like this: