I’m starting from a Lucene index which someone else created. I’d like to find

Question

0

Asked: May 12, 20262026-05-12T07:58:26+00:00 2026-05-12T07:58:26+00:00

I’m starting from a Lucene index which someone else created. I’d like to find

0

I’m starting from a Lucene index which someone else created. I’d like to find all of the words that follow a given word. I’ve extracted the term (org.apache.lucene.index.Term) of interest from the index, and I can find the documents which contain that term:

segmentTermDocs = segmentReader.termDocs(term);
while (segmentTermDocs.next) {
        doc = segmentReader.document(segmentTermDocs.doc);
...
}

Is there a way for me to locate the positions of the term in the document and extract the terms which follow it?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-12T07:58:27+00:00

Since indexing the n-grams isn’t an option in your situation, some brute force will be required. You could enumerate the IndexReader’s terms and termPositions, but that would likely be excrutiatingly slow.

A faster approach would be implement a divide-and-conquer search algorithm by enumerating the terms and using a MultiPhraseQuery to check a group at once. Split all the potential terms into reasonably sized groups (say 1000), and run a MultiPhraseQuery search with each chunk and your prefix word. If there are any hits, recursively call on sub-groups until you reach a single term.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m starting from a Lucene index which someone else created. I’d like to find

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply