I am using Lucene.Net 2.9.2 and I reckon I will need to write a

Question

0

Asked: May 26, 20262026-05-26T20:54:01+00:00 2026-05-26T20:54:01+00:00

I am using Lucene.Net 2.9.2 and I reckon I will need to write a

0

I am using Lucene.Net 2.9.2 and I reckon I will need to write a custom tokenizer but wanted to check in case I am missing something obvious.

The document consists of Title, Keywords and Content plus some metadata like author, date etc each stored as a field. The documents are software technical documents and may contain phrases such as ‘.Net’, ‘C++’, ‘C#’ in the title, keywords and/or content.

I’m using the KeywordAnalyzer for the Keyword field and StandardAnalyzer for Title and Content – StopWords and LowerCase etc are necessary as the documents can be very long.

I have also written a Synonym custom filter for search as I want to search for, for example, ‘C#’ but also recognise ‘CSharp’, ‘C#.Net’ etc. The tokenizer has already removed the ‘#’ from ‘C#’ or the ‘++’ from C++ and therefore can be confused with, say, a ‘C’ language reference

My thought is that when I index Title and Content that I need to branch the tokenization depending on whether the current token is part of the keyword phrases or any of its synonyms.

Is that the best approach? Many thanks in advance 🙂

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T20:54:01+00:00

Editorial Team

2026-05-26T20:54:01+00:00Added an answer on May 26, 2026 at 8:54 pm

I think that you can use WhitespaceTokenizer, then plug in a KeywordMarkerFilter to mark some tokens as ‘inviolable’ and finally supply your own filter that would strip punctuation characters. Maybe someone with knowledge of Lucene.Net will suggest something; e.g. in Solr WordDelimiterFilter could be used.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using Lucene.Net 2.9.2 and I reckon I will need to write a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply