Can someone please explain what does this code do?
def stemmer(word):
[(stem,end)] = re.findall('^(.*ss|.*?)(s)?$',word)
return stem
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
It splits a word into two parts:
stemandend. There are three cases:ss(or even mores):stem <- wordandend <- ""s:stem <- word without "s"andend <- "s"s:stem <- wordandend <- ""This is done by a regular expression which captures the full word (due to
^....$). The first part (i.e.stem) consists either of as much as possible ending inss(.*ss) or if that is not possible of as less as possible (.*?). Then possibly an endingsis taken to be theendpart.Note that in the first case (as much as possible ending in
ss) there can never be an additionalsfor theendpart.