I want to remove stop words. Here is my code
import nltk
from nltk.corpus import stopwords
import string
u="The apple is the pomaceous fruit of the apple tree, species Malus domestica in the rose family (Rosaceae). It is one of the most widely cultivated tree fruits, and the most widely known of the many members of genus Malus that are used by humans."
v="An orange is a fruit of the orangle tree. it is the most cultivated tree fruits"
u=u.lower()
v=v.lower()
u_list=nltk.word_tokenize(u)
v_list=nltk.word_tokenize(v)
for word in u_list:
if word in stopwords.words('english'):
u_list.remove(word)
for word in v_list:
if word in stopwords.words('english'):
v_list.remove(word)
print u_list
print "\n\n\n\n"
print v_list
But only some stop words are removed. Please help me with this
The problem with what you are doing is list.remove(x) only removes the first occurrence of
x, not every x. To remove every instance, you could usefilter, but I would opt for something like this: