I am using Haystack and ElasticSearch as the backend to implement search in my Django app. I’m not understanding how it handles stemming. My indexed model has the word “embedded” in its text. A search for “embedded” yields the correct result. A search for “embed” yields nothing.
I am doing my query the simplest way the docs show how:
SearchQuerySet().filter(content='embed')
I dug into the code and found that ElasticSearch was being hit with:
import requests
url = 'http://127.0.0.1:9200/haystack/modelresult/_search?from=0&size=20'
kwargs = {"data": '{"query": {"filtered": {"filter": {"fquery": {"query": {"query_string": {"query": "django_ct:(component_catalog.component)"}}, "_cache": true}}, "query": {"query_string": {"query": "(embed)", "default_operator": "AND", "default_field": "text", "auto_generate_phrase_queries": true, "analyze_wildcard": true}}}}}', "timeout": 10}
requests.get(url, **kwargs)
Questions:
Why does Haystack not return stemmed results?
What does (embed) mean?
query_stringuses Lucene’s query syntax, so(embed)represents a logical grouping around ’embed’. From the Lucene docs, this is an example:For your situation, you can just ignore it. It’s probably something that Haystack auto-inserts.
My first inclination is to say that your mapping is incorrect. How have you analyzed and indexed your field? Did you use the Snowball stemmer?
You can use curl to test out how the various analyzers respond, which is a handy trick when you aren’t getting the results you want: