I am still doing something wrong.
Could somebody pls help me?
I want to create a custom analyzer with ascii filter in Rails + Mongoid.
I have a simple model product which has field name.
class Product
include Mongoid::Document
field :name
settings analysis: {
analyser: {
ascii: {
type: 'custom',
tokenizer: 'whitespace',
filter: ['lowercase','asciifolding']
}
}
}
mapping do
indexes :name, analyzer: 'ascii'
end
end
Product.create(name:"svíčka")
Product.search(q:"svíčka").count #1
Product.search(q:"svicka").count #0 can't find - expected 1
Product.create(name:"svicka")
Product.search(q:"svíčka").count #0 can't find - expected 1
Product.search(q:"svicka").count #1
And when I check the indexes with elasticsearch-head I expected that the index is stored without accents like this “svicka”, but the index looks like this “Svíčka”.
What am I doing wrong?
When I check it with API it looks OK:
curl -XGET 'localhost:9200/_analyze?tokenizer=whitespace&filters=asciifolding' -d 'svíčka'
{"tokens":[{"token":"svicka","start_offset":0,"end_offset":6,"type":"word","position":1}]}
http://localhost:9200/development_imango_products/_mapping
{"development_imango_products":{"product":{"properties":{"name":{"type":"string","analyzer":"ascii"}}}}}
curl -XGET 'localhost:9200/development_imango_products/_analyze?field=name' -d 'svíčka'
{"tokens":[{"token":"svicka","start_offset":0,"end_offset":6,"type":"word","position":1}]}
You can check how you are actually indexing your document using the analyze api.
You need also to take into account that there’s a difference between what you index and what you store. What you store is returned when you query, and it is exactly what you send to elasticsearch, while what you index determines what documents you get back while querying.
Using the asciifolding is a good choice for you usecase, it should return results either query ing for svíčka or svicka. I guess there’s just a typo in your settings:
analysershould beanalyzer. Probably that analyzer is not being used as you’d expect.UPDATE
Given your comment you didn’t solve the problem yet. Can you check what your mapping looks like (
localhost:9200/index_name/_mapping)? The way you’re using the analyze api is not that useful since you’re manually providing the text analysis chain, but that doesn’t mean that chain is applied as you’d expect to your field. Better if you provide the name of the field like this:curl -XGET ‘localhost:9200/index_name/_analyze?field=field_name’ -d ‘svíčka’
That way the analyze api will rely on the actual mapping for that field.
UPDATE 2
After you made sure that the mapping is correctly submitted and everything looks fine, I noticed you’re not specifying the field that you want to to query. If you don’t specify it you’re querying the
_allspecial field, which contains by default all the field that you’re indexing, and uses by default theStandardAnalyzer. You should use the following query:name:svíčka.