In its enthusiasm to stemm tokens into lexemes, PostgreSQL Full Text Search engine also reduce proper nouns. For instance:
essais=> select to_tsquery('english', 'bortzmeyer'); to_tsquery ------------ 'bortzmey' essais=> select to_tsquery('english', 'balling'); to_tsquery ------------ 'ball' (1 row)
At least for the first one, I’m sure it is not in the english dictionary! What is the better way to avoid this spurious stemming?
The point of stemming algorithms is not to reduce every word to its proper stem; the goal is to reduce words that are alike to a common stemmed form. The goal is generally not to get a word that can be presented to the user: even if ‘balling’ and ‘ball’ would both produce ‘kjebnkkekaa’ the algorithm is correct because it still sees ‘balling’ and ‘ball’ as generally concerning the same thing.
Also beware that no stemming algorithm is absolutely perfect, for more info look up the Porter Stemming algorithm