I’ using lucene with solr to index some documents (news). Those documents also have

Question

0

Asked: May 25, 20262026-05-25T16:54:04+00:00 2026-05-25T16:54:04+00:00

I’ using lucene with solr to index some documents (news). Those documents also have

0

I’ using lucene with solr to index some documents (news). Those documents also have an HEADLINE.
Now I try to make an facet search over the HEADLINE field to find the terms with the highest count.
All this works without an problem including an stopword-list.
The HEADLINE field is an multi valued field. I use the solr.StandardTokenizerFactory to split those field into single terms (I know, this is not best practise, but it’s the only way and it works).

sometimes, the tokenizer splits terms, which shouldn’t be splitted, like 9/11 (which is splitted into 9 and 11). So I decided to use an “protword” list. “9/11” is part of this protword list. But no change.

Here is the part from my schema.xml

  <fieldType name="facet_headline" class="solr.TextField" omitNorms="true">
        <analyzer>
            <tokenizer class="solr.StandardTokenizerFactory" protected="protwords.txt"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.TrimFilterFactory" />
            <filter class="solr.StopFilterFactory"
                    ignoreCase="true"
                    words="stopwords.txt"
                    enablePositionIncrements="true"
                protected="protwords.txt"
                />
        </analyzer>
   </fieldType>

looking at the facet result, i see a lots of documents dealing with “9/11” grouped (faceted) at “9” or “11” but never “9/11”.

Why this does not work?

Thank you.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T16:54:05+00:00

Editorial Team

2026-05-25T16:54:05+00:00Added an answer on May 25, 2026 at 4:54 pm

the final solution for that problem was to choose the solr.PatternTokenizerFactory

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ using lucene with solr to index some documents (news). Those documents also have

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply