This string is indexed: “Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.”
My query is: “Hello world. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Other text.“
When I run the query I get no results. How can I get the result if my query has a little “garbage” text in it?
I am using Django, Haystack, and ElasticSearch.
If you use a “match” query with the default operator of “or”, then you’ll get hits where any of the words match, but hits where lots of the words match will rank above hits where few of the words match.
http://www.elasticsearch.org/guide/reference/query-dsl/match-query.html
But if you mean you only want to match that exact phrase, but allow some additional text on either end, I’m not sure you can do precisely that.
One option, if you can relax the requirement for an exact phrase match, would be to analyse the documents (and the query) using a shingle token filter.
http://www.elasticsearch.org/guide/reference/index-modules/analysis/shingle-tokenfilter.html
Then a match query with “or” operator would operate on pairs, triplets, quads etc. of words (depending on filter configuration). Setting the shingle size to just 2 or 3 would make it unlikely that a document containing many of the same words as the query (by chance) would score highly.
Or you could use a phrase query with slop (see bottom of match query page above).
Both of these approaches would allow insertions as well as prefixes/suffixes though.