I’ve started playing with Lucene.NET today and I wrote a simple test method to

Question

0

Asked: May 16, 20262026-05-16T18:34:18+00:00 2026-05-16T18:34:18+00:00

I’ve started playing with Lucene.NET today and I wrote a simple test method to

0

I’ve started playing with Lucene.NET today and I wrote a simple test method to do indexing and searching on source code files. The problem is that the standard analyzers/tokenizers treat the whole camel case source code identifier name as a single token.

I’m looking for a way to treat camel case identifiers like MaxWidth into three tokens: maxwidth, max and width. I’ve looked for such a tokenizer, but I couldn’t find it. Before writing my own: is there something in this direction? Or is there a better approach than writing a tokenizer from scratch?

UPDATE: in the end I decided to get my hands dirty and I wrote a CamelCaseTokenFilter myself. I’ll write a post about it on my blog and I’ll update the question.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-16T18:34:18+00:00

Editorial Team

2026-05-16T18:34:18+00:00Added an answer on May 16, 2026 at 6:34 pm

Solr has a WordDelimiterFactory which generates a tokenizer similar to what you need. Maybe you can translate the source code into C#.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve started playing with Lucene.NET today and I wrote a simple test method to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply