I have a working Lucene index supporting a suggestion service. When a user types

Question

0

Asked: June 1, 20262026-06-01T11:26:47+00:00 2026-06-01T11:26:47+00:00

I have a working Lucene index supporting a suggestion service. When a user types

0

I have a working Lucene index supporting a suggestion service. When a user types into a search box it queries the index by the SUGGESTION_FIELD. Each entry in SUGGESTION_FIELD can be one of many supported languages and each is stored using an appropriate language specific analyzer. In order to know what analyzer was used there is second field per entry which stores the LOCALE. So during a query I can say something like the code below to do a language specific query using appropriate analyzer

QueryParser parser = new QueryParser(Version.LUCENE_33, SUGGESTION_FIELD, getLangaugeAnalyzer(locale));
return searcher.search(parser.parse("SUGGESTION_FIELD:" + queryString + " AND LOCALE:"
                + locale), 100);

The works…. But now the client wants to be able to search using multiple languages at once.

My Question: What would be the fastest querying solution bearing in mind that a suggestion service needs to be very fast?…

Sol. #1. The simplest solution would seem to be; do the query multiple times. Once for each locale, thereby applying the corresponding language analyser each time. Finally append the results from each query in some sensible fashion

Sol. #2. Alternatively I could re-index using a column for each locale such that:

SUGGESTION_FIELD_en, SUGGESTION_FIELD_fr, SUGGESTION_FIELD_es etc..

using a different analyzer for each field (using PerFieldAnalyzerWrapper) and then query using a more complex query string such that:

"SUGGESTION_FIELD_en:" + queryString + " AND SUGGESTION_FIELD_fr:" + queryString + " AND SUGGESTION_FIELD_es:" + queryString

Please help if you think you 🙂

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T11:26:49+00:00

Your query is going to be something like this: (sugField:queryString1 AND locale:loc1) OR (sugField:queryString2 AND locale:loc2) OR …. This is a top-level BooleanQuery with subordinate BooleanQueries added with occurs=SHOULD, where each subordinate query has its terms with occurs=MUST. The queryString1, queryString2, etc. are the outputs from different language analyzers having the same input, the string the user entered.

Each subordinate query involves mandatory terms (from your query string) that are rare in the index and Lucene knows this at the outset (it knows the total doc count for each Term in the index) so it will first constrain the result by the queryString and then additionally intersect that with the locale terms. This will be VERY efficient no matter how large your index.

As for the different analyzers, I suggest you don’t use the QueryParser, but create the entire query programmatically. This is a good general advice whenever you don’t enter the query by hand and in your case it is the only way to gain control of the analyzing aspect. Run your query string through each of the language-specific analyzers and add their output tokens as TermQueries to the subordinate BooleanQueries.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a working Lucene index supporting a suggestion service. When a user types

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply