Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8329235
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T01:42:35+00:00 2026-06-09T01:42:35+00:00

I am attempting to optimize highlighting in my SOLR instance as this seems to

  • 0

I am attempting to optimize highlighting in my SOLR instance as this seems to slow down queries by 2 orders of magnitude. I have a tokenized field index and stored with following definition:

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\+" replacement="%2B"/>
    <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrements="true" />
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\+" replacement="%2B"/>
    <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrements="true" />
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

Term vectors etc are also generated:

<field name="Events" type="text_general" multiValued="true" stored="true" indexed="true" termVectors="true" termPositions="true"  termOffsets="true"/>

For the highlight component I use the default SOLR config. The query I try uses FastVectorHighlighter but still takes ~1500ms, which is awfully long for ~1000 docs with 10-20 values stored in the field per doc. Here is the query:

q=Events:http\://mydomain.com/resource/term/906&fq=(Document_Code:[*+TO+*])&hl.requireFieldMatch=true&facet=true&hl.simple.pre=<b>&hl.fl=*&hl=true&rows=10&version=2&fl=uri,Document_Type,Document_Title,Modification_Date,Study&hl.snippets=1&hl.useFastVectorHighlighter=true

What I find curious is that in the solr admin stats a single query generates 9146 requests to HtmlFormatter and GapFragmenter. Any thoughts on why this might be happening and how the performance of the highlighter can be improved?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T01:42:39+00:00Added an answer on June 9, 2026 at 1:42 am

    It appears that the problem is caused by “hl.fl=*”, which caused the DefaultSolrHighlighter to iterate over a relatively large number of fields (in my index) for each document found (10 max in my case). This causes the additional O(n^2) time. Here is the relevant code snippet:

    for (int i = 0; i < docs.size(); i++) {
      int docId = iterator.nextDoc();
      Document doc = searcher.doc(docId, fset);
      NamedList docSummaries = new SimpleOrderedMap();
      for (String fieldName : fieldNames) {
        fieldName = fieldName.trim();
        if( useFastVectorHighlighter( params, schema, fieldName ) )
          doHighlightingByFastVectorHighlighter( fvh, fieldQuery, req, docSummaries, docId, doc, fieldName );
        else
          doHighlightingByHighlighter( query, req, docSummaries, docId, doc, fieldName );
      }
      String printId = schema.printableUniqueKey(doc);
      fragments.add(printId == null ? null : printId, docSummaries);
    }
    

    Reducing the number of fields should improve the behaviour greatly. However, in my case I cannot reduce it bellow 20 fields, so I will check whether enabling the FastVectorHighlighter for all of them will improve the overall performance.

    I was also wondering whether we could reduce this list even further by using some info from the matching docs (which are already available at this point).

    Update

    Using FastVectorHighlighter for all fields (set termVectors, termPositions and termOffsets to true for all tokenized fields) did indeed improve the highlighting speed by an order of magnitude, so that now all queries run < 1s. The size of the index increased by 3 times its original value (from 500M to 2G). There is also a problem with how the fragments for multivalued fields are generated, but the improvement of performance is high enough.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm attempting to optimize a T-SQL stored procedure I have. It's for pulling records
I have a search query that I'm inheriting and attempting to optimize. I am
When attempting to write/read cookies that have brackets in the name, it seems like
Attempting to build a C# NPAPI plugin I have found a tutorial which describes
Attempting to get Spring internationalization working. I have used classpath:messages basename, created .properties files
Attempting to use the data series from this example no longer passes the JSONLint
Attempting/struggling to get registration and sign-up working within an active admin project. I have
So I'm attempting to optimize a product image carousel which cycles through items as
I have been attempting to implement a custom dojo build to replace the dojo
Attempting to follow this Java tutorial . About 63 pages in, you are instructed

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.