We have an application which gets text in different languages. Our aim is to understand the language of the text, analyze it using different stemmer for each language and index it. I am able to detect language using the Solr’s Language Detection mechanism.
Now, I want to analyze texts on the fly using the different fieldType for each language and store each text in different field.
For example, say I have following fields in schema.xml.
<!-- English -->
<field name="text_en" type="text_en" indexed="true" stored="true"/>
<!-- German -->
<field name="text_de" type="text_de" indexed="true" stored="true"/>
<!-- Turkish -->
<field name="text_tr" type="text_tr" indexed="true" stored="true"/>
When I detect that text is in English then I want to dynamically add it to the text_en field which will be analyzed/stemmed using different technique than others.
Is there a built in mechanism on Solr which supports this? If yes how can I configure it? Or should I develop plugin for this purpose?
Please take a look at the language detection parameter
For me it looks like, first you have to us the default or override an mapping to map the language to an field, where the language-letter-code (like en,de,….) is part of the field name.
Take a look at this:
http://alisalimi25.blogspot.de/2012/07/phonetic-search-and-language-detection.html
…ad the example which looks like it will fill the fields: title_na, title_da,…
Sorry, i’m not 100% sure, but this is the way, i interpret the documentation.