Simple question: When do we stem or lemmatize the words? Is stemming helpful for all nlp processes or are there applications where using full form of words might result in better accuracy or precision?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
In the context of machine learning based NLP, stemming makes your training data more dense. It reduces the size of the dictionary (number of words used in the corpus) two or three-fold (of even more for languages with many flections like French, where a single stem can generate dozens of words in case of verbs for instance).
Having the same corpus, but less input dimensions, ML will work better. Recall should really be better.
The downside is, if in some cases the actual word (as opposed to its stem) makes a difference, then your system won’t be able to leverage it. So you might lose some precision.