i got a problem about the lucene termvector offsets that is when i analyzed

Question

0

Asked: May 23, 20262026-05-23T13:33:44+00:00 2026-05-23T13:33:44+00:00

i got a problem about the lucene termvector offsets that is when i analyzed

0

i got a problem about the lucene termvector offsets that is when i analyzed a field with my custom analyzer it will give the invalid offsets for termvector but it is fine with standard analyzer, here is my analyzer code

public class AttachmentNameAnalyzer extends Analyzer {
    private boolean stemmTokens;
    private String name;

    public AttachmentNameAnalyzer(boolean stemmTokens, String name) {
        super();
        this.stemmTokens    = stemmTokens;
        this.name           = name;
    }

    @Override
    public TokenStream tokenStream(String fieldName, Reader reader) {
        TokenStream stream = new AttachmentNameTokenizer(reader);
        if (stemmTokens)
            stream = new SnowballFilter(stream, name);
        return stream;
    }

    @Override
    public TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException {
        TokenStream stream = (TokenStream) getPreviousTokenStream();

        if (stream == null) {
            stream = new AttachmentNameTokenizer(reader);
            if (stemmTokens)
                stream = new SnowballFilter(stream, name);
            setPreviousTokenStream(stream);
        } else if (stream instanceof Tokenizer) {
            ( (Tokenizer) stream ).reset(reader);
        }

        return stream;
    }
}

whats wrong with this “Help required”

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T13:33:45+00:00

the problem it with the analyzer as i posted the code for analyzer earlier, actually the token stream is need to be rest for every new entry of text that is to be tokenized.

 public TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException {
        TokenStream stream = (TokenStream) getPreviousTokenStream();

        if (stream == null) {
            stream = new AttachmentNameTokenizer(reader);
            if (stemmTokens)
                stream = new SnowballFilter(stream, name);
            setPreviousTokenStream(stream); // --------------->  problem was here 
        } else if (stream instanceof Tokenizer) {
            ( (Tokenizer) stream ).reset(reader); 
        }

        return stream;
    }

every time when i sets the previous token stream the next coming text field the has to be separately tokenized it always starts with end offset of last token stream that make the term vector offset wrong for new stream it now it works fine like this

ublic TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException {
            TokenStream stream = (TokenStream) getPreviousTokenStream();

            if (stream == null) {
                stream = new AttachmentNameTokenizer(reader);
                if (stemmTokens)
                    stream = new SnowballFilter(stream, name);
            } else if (stream instanceof Tokenizer) {
                ( (Tokenizer) stream ).reset(reader); 
            }

            return stream;
        }

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

i got a problem about the lucene termvector offsets that is when i analyzed

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply