when i query for “elegant” in solr i get results for “elegance” too.
I used these filters for index analyze
WhitespaceTokenizerFactory
StopFilterFactory
WordDelimiterFilterFactory
LowerCaseFilterFactory
SynonymFilterFactory
EnglishPorterFilterFactory
RemoveDuplicatesTokenFilterFactory
ReversedWildcardFilterFactory
and for query analyze:
WhitespaceTokenizerFactory
SynonymFilterFactory
StopFilterFactory
WordDelimiterFilterFactory
LowerCaseFilterFactory
EnglishPorterFilterFactory
RemoveDuplicatesTokenFilterFactory
I want to know which filter affecting my search result.
EnglishPorterFilterFactory
Thats the short answer 😉
A little more information:
English Porter means the english porter stemmer stemming alogrithm. And both elegant and elegance have according to the stemmer (which is a heuristical word root builder) the same stem.
You can verify this online e.g. Here. Basically you will see “eleg ant ” and “eleg ance” stemmed to the same stem > eleg.
From Solr source:
Here exactly comes the protwords file into play:
Thats the part which affects the stemming. There you see the invocation of the snowball library