I’m trying to use Apache Lucene for tokenizing, and I am baffled at the process to obtain Tokens from a TokenStream.
The worst part is that I’m looking at the comments in the JavaDocs that address my question.
Somehow, an AttributeSource is supposed to be used, rather than Tokens. I’m totally at a loss.
Can anyone explain how to get token-like information from a TokenStream?
Yeah, it’s a little convoluted (compared to the good ol’ way), but this should do it:
Edit: The new way
According to Donotello,
TermAttributehas been deprecated in favor ofCharTermAttribute. According to jpountz (and Lucene’s documentation),addAttributeis more desirable thangetAttribute.