I’ve seen posts on performing autocomplete across multiple fields but not on performing autocomplete on multivalued fields.
My autocomplete feature is working for non-multivalued fields.
My problem is when I run the query on the multivalued field, wherever a document matches that query, all the fields in the multivalued field of that document are returned in the facet results.
Below is my schema, similar to what is proposed in the Solr 4 Cookbook.
<fieldType name="text_autocomplete" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<field name="publisherText-str" type="string" indexed="true" stored="false" multiValued="true"/>
<field name="publisherText-ac" type="text_autocomplete" indexed="true" stored="true" required="false" multiValued="true"/>
As you can see publisherText is a multivalued field. I execute a query like this to test the autocomplete feature:
/select?q=publisherText-ac:new&facet=true&facet.field=publisherText-str&facet.mincount=1&rows=0
The query is “new”, and this matches a set of documents. However the facet result set contains the other publisherText values (contained in the multivalued field) for each matching document.
Update: When querying “new”, the result set should include “New York Times” and “Times New Roman” but does not need to solve the infix problem: “Knewton Gazette” does not need to be in the result set.
Is there a way to have the facet result only contain values that match the query?
Or is there a different (better?) way to support the full autocomplete feature that handles multiValued fields more gracefully?
Thanks.
I think that the most optimal way would be to create a separate collection or core (depending if you are using cloud or not) and have your data indexed in a way, that it can be queries for the desired query result. Of course it may not be possible in some cases, but if it is in your case go for it. In such core you would only have fields and data relevant to your autocomplete so in most cases it will be smaller, than the original core, less terms and that should result in faster queries. In addition to that, such core or collection optimized for autocomplete queries and you’ll gain even more performance out of it.
However if you can’t go for multiple cores/collections approach than highlighting may be the best way to go, if you need filtering. In such case you may want to have term verctors turned on and use FastVectorHighlighting to have better performance of Solr highlighting (http://solr.pl/en/2011/06/13/solr-3-1-fastvectorhighlighting/).