In the Lucene query syntax I’d like to combine * and ~ in a

Question

0

Editorial Team

Asked: May 14, 20262026-05-14T05:11:54+00:00 2026-05-14T05:11:54+00:00

In the Lucene query syntax I’d like to combine * and ~ in a

0

In the Lucene query syntax I’d like to combine * and ~ in a valid query similar to:
bla~* //invalid query

Meaning: Please match words that begin with “bla” or something similar to “bla”.

Update:
What I do now, works for small input, is use the following (snippet of SOLR schema):

<fieldtype name="text_ngrams" class="solr.TextField">
  <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
  </analyzer>
  <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>

In case you don’t use SOLR, this does the following.

Indextime: Index data by creating a field containing all prefixes of my (short) input.

Searchtime: only use the ~ operator, as prefixes are explicitly present in the index.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T05:11:55+00:00

I do not believe Lucene supports anything like this, nor do I believe it has a trivial solution.

“Fuzzy” searches do not operate on a fixed number of characters. bla~ may for example match blah and so it must consider the entire term.

What you could do is implement a query expansion algorithm that took the query bla~* and converted it into a series of OR queries

bla* OR blb* OR blc OR .... etc.

But that is really only viable if the string is very short or if you can narrow the expansion based on some rules.

Alternatively if the length of the prefix is fixed you could add a field with the substrings and perform the fuzzy search on that. That would give you what you want, but will only work if your use case is sufficiently narrow.

You don’t specify exactly why you need this, perhaps doing so will elicit other solutions.

One scenario I can think of is dealing with different form of words. E.g. finding car and cars.

This is easy in English as there are word stemmers available. In other languages it can be quite difficult to implement word stemmers, if not impossible.

In this scenario you can however (assuming you have access to a good dictionary) look up the search term and expand the search programmatically to search for all forms of the word.

E.g. a search for cars is translated into car OR cars. This has been applied successfully for my language in at least one search engine, but is obviously non-trivial to implement.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In the Lucene query syntax I’d like to combine * and ~ in a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply