Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8347427
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T07:21:42+00:00 2026-06-09T07:21:42+00:00

I am new to Lucene and I would really appreciate an example on how

  • 0

I am new to Lucene and I would really appreciate an example on how to have bigrams and trigrams tokens in the index.

I’m using the following code and I have modified it to be able to calculate the term frequencies and weight but I need to do that to bigrams and trigrams also. I can’t see the tokenization part! I searched online and some of the suggested classes do not exist in Lucene 3.4.0 as they have been deprecated.

Any suggestions please?

Thanks,
Moe

EDIT: ——————————–

Now I’m using the NGramTokenFilter as mbonaci suggested.
This is part of the code where I Tokenize a text to get the uni, bi and trigrams. But it’s being done on a character rather than word level.

Instead of:
[H][e][l][l][o][HE][EL] etc.

I’m looking for: [Hello][World][Hello World]

        int min =1;
        int max =3;
        WhitespaceAnalyzer analyzer = new WhitespaceAnalyzer(Version.LUCENE_34);
        String text ="hello my world";
        TokenStream tokenStream = analyzer.tokenStream("Data", new StringReader(text));


        NGramTokenFilter myfilter = new NGramTokenFilter(tokenStream,min,max);
        OffsetAttribute offsetAttribute2 = myfilter.addAttribute(OffsetAttribute.class);
        CharTermAttribute charTermAttribute2 = myfilter.addAttribute(CharTermAttribute.class)
        while (myfilter.incrementToken()) {
            int startOffset = offsetAttribute2.startOffset();
            int endOffset = offsetAttribute2.endOffset();
            String term = charTermAttribute2.toString();
            System.out.println(term);
        };
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T07:21:43+00:00Added an answer on June 9, 2026 at 7:21 am

    you need to look at shingles. That article shows how to do it.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have the following code and would appreciate your advice. QueryParser queryParser = new
I have written a following code in my project: final IndexSearcher indexSearcher = new
I use NumericField write a Integer in lucene Index: doc.add( new NumericField(id,Integer.MAX_VALUE,Field.Store.YES,true) .setIntValue(123) );
ok, I'm totally new to SOLR and Lucene, but have got Solr running out-of-the-box
I am trying to index a table in a database using Lucene. I use
I have a system that uses lucene. Now for a few reasons I would
Working with a Lucene index, I have a standard document format that looks something
I am trying to create a new Lucene index on a site running Sitecore
I am using Lucene.NET and I would like to check before whether a document
I'm new to solr and i'm trying to index some files using solrj. I

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.