i got a problem about the lucene termvector offsets that is when i analyzed a field with my custom analyzer it will give the invalid offsets for termvector but it is fine with standard analyzer, here is my analyzer code
public class AttachmentNameAnalyzer extends Analyzer {
private boolean stemmTokens;
private String name;
public AttachmentNameAnalyzer(boolean stemmTokens, String name) {
super();
this.stemmTokens = stemmTokens;
this.name = name;
}
@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream stream = new AttachmentNameTokenizer(reader);
if (stemmTokens)
stream = new SnowballFilter(stream, name);
return stream;
}
@Override
public TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException {
TokenStream stream = (TokenStream) getPreviousTokenStream();
if (stream == null) {
stream = new AttachmentNameTokenizer(reader);
if (stemmTokens)
stream = new SnowballFilter(stream, name);
setPreviousTokenStream(stream);
} else if (stream instanceof Tokenizer) {
( (Tokenizer) stream ).reset(reader);
}
return stream;
}
}
whats wrong with this “Help required”
the problem it with the analyzer as i posted the code for analyzer earlier, actually the token stream is need to be rest for every new entry of text that is to be tokenized.
every time when i sets the previous token stream the next coming text field the has to be separately tokenized it always starts with end offset of last token stream that make the term vector offset wrong for new stream it now it works fine like this