I am new to Lucene and I would really appreciate an example on how

Question

0

Asked: June 9, 20262026-06-09T07:21:42+00:00 2026-06-09T07:21:42+00:00

I am new to Lucene and I would really appreciate an example on how

0

I am new to Lucene and I would really appreciate an example on how to have bigrams and trigrams tokens in the index.

I’m using the following code and I have modified it to be able to calculate the term frequencies and weight but I need to do that to bigrams and trigrams also. I can’t see the tokenization part! I searched online and some of the suggested classes do not exist in Lucene 3.4.0 as they have been deprecated.

Any suggestions please?

Thanks,
Moe

EDIT: ——————————–

Now I’m using the NGramTokenFilter as mbonaci suggested.
This is part of the code where I Tokenize a text to get the uni, bi and trigrams. But it’s being done on a character rather than word level.

Instead of:
[H][e][l][l][o][HE][EL] etc.

I’m looking for: [Hello][World][Hello World]

        int min =1;
        int max =3;
        WhitespaceAnalyzer analyzer = new WhitespaceAnalyzer(Version.LUCENE_34);
        String text ="hello my world";
        TokenStream tokenStream = analyzer.tokenStream("Data", new StringReader(text));


        NGramTokenFilter myfilter = new NGramTokenFilter(tokenStream,min,max);
        OffsetAttribute offsetAttribute2 = myfilter.addAttribute(OffsetAttribute.class);
        CharTermAttribute charTermAttribute2 = myfilter.addAttribute(CharTermAttribute.class)
        while (myfilter.incrementToken()) {
            int startOffset = offsetAttribute2.startOffset();
            int endOffset = offsetAttribute2.endOffset();
            String term = charTermAttribute2.toString();
            System.out.println(term);
        };

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T07:21:43+00:00

Editorial Team

2026-06-09T07:21:43+00:00Added an answer on June 9, 2026 at 7:21 am

you need to look at shingles. That article shows how to do it.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am new to Lucene and I would really appreciate an example on how

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply