I’m new to Apache Solr and trying to make a query using search terms against a field called “normalizedContents” and of type “text”.
All of the search terms must exist in the field. Problem is, I’m getting inconsistent results.
For example, the solr index has only one document with normalizedContents field with value = “EDOUARD SERGE WILFRID EDOS0004 UNE MENTION COMPLEMENTAIRE”
I tried these queries in solr’s web interface:
- normalizedContents:(edouard AND une) returns the result
- normalizedContents:(edouar* AND une) returns the result
- normalizedContents:(EDOUAR* AND une) doesn’t return anything
- normalizedContents:(edouar AND une) doesn’t return anything
- normalizedContents:(edouar* AND un) returns the result (although there’s no “un” word)
- normalizedContents:(edouar* AND uned) returns the result (although there’s no “uned” word)
Here’s the declaration of normalizedContents in schema.xml:
<field name="normalizedContents" type="text" indexed="true" stored="true" multiValued="false"/>
So, wildcards and AND operator do not follow the expected behavior. What am I doing wrong ?
Thanks.
By default the field type text does stemming on the content (
solr.SnowballPorterFilterFactory). Thus ‘un’ and ‘uned’ match une. Then you might not have thesolr.LowerCaseFilterFactoryfilter on both, query and index analyzer, therefore EDUAR* does not match. And the 4th doesnt match as edouard is not stemmed to edouar. If you want exact matches, you should copy the data in another field that has a type with a more limited set of filters. E.g. only asolr.WhitespaceTokenizerFactoryPosting the
<fieldType name="text">section from your schema might be helpful to understand everything.