i am looking for all the delimiters on which java lucene standard analyzer tokenizes

Question

0

Editorial Team

Asked: May 22, 20262026-05-22T22:21:04+00:00 2026-05-22T22:21:04+00:00

i am looking for all the delimiters on which java lucene standard analyzer tokenizes

0

i am looking for all the delimiters on which java lucene standard analyzer tokenizes the input string.

need to know all delimiters that are by default used for tokenizing.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-22T22:21:05+00:00

Editorial Team

2026-05-22T22:21:05+00:00Added an answer on May 22, 2026 at 10:21 pm

I know (from Lucene in Action) that all characters which are not a-zA-Z or variatons of a-zA-Z that have diacritics are used as delimiters, including numbers.

So you might have Mc’Donald splitted in “Mc” “Donald”, you might have “Web2.0” tokenized as “Web”, and so on.

The best is to do a test and enter all kinds of characters and then post your results here.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

i am looking for all the delimiters on which java lucene standard analyzer tokenizes

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply