Im looking for something like a search and replace functionality in Solr.
I have dumped a document into solr, and doing some text analysis over it. At times i may need to group couple of words together and want solr to treat it as one single token.
For ex: “South Africa” will be treated as one single token for further processing. And also notice that these can be dynamic and im going to let the end user to decide which words he/she has to group. So NO Semantics required.
My current plan is to add a special character between these two words so Solr will treat it as one single token (StandardTokenizerFactory) for further processing.
So im looking for something like:
replace("South Africa",South_Africa")
Can anyone has any solution?
Use a Synonym filter and define these replacements in a synonyms.txt file. Once you have all of your definitions, rebuild the index.
You would probably have an entry like this to handle both the case where a field has a LowerCase filter before Synonym and where Synonym comes before LowerCase.
South Africa,south africa => southafrica
More info here http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory