For my solr implementation I want the query to return the words with and withhout diacriticts regardless if you search with or without diacritics.
To give an example
The search word is “çest” – Solr returns: ‘cest‘, ‘çest‘ and ‘çest ca‘
The search word is “cest” – Solr returns: ‘cest‘, ‘çest‘ and ‘çest ca‘
Currenty the first works. When I search “çest” it returns both cest and çest. However when I search “cest” it returns only ‘cest’
This is how it looks in my schema:
<fieldType name="text_special_search" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\W+" replacement="-"/>
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
</analyzer>
</fieldType>
Is there a way to let it work both ways?
If you want either matches, you don’t need the
solr.PatternReplaceCharFilterFactory.This would remove the special characters before they are passed to the ASCII filter.
You can use :-
You can also use a WhiteSpace Tokenizer to have tokens and use Lower Case as a Filter.
Also, remember the order of execution in an Analyzer is as follows, irespective of the order you have :-