Assume I have a string text = A compiler translates code from a source

Question

0

Editorial Team

Asked: June 3, 20262026-06-03T13:15:19+00:00 2026-06-03T13:15:19+00:00

Assume I have a string text = A compiler translates code from a source

0

Assume I have a string text = "A compiler translates code from a source language". I want to do two things:

I need to iterate through each word and stem using the NLTK library. The function for stemming is PorterStemmer().stem_word(word). We have to pass the argument ‘word’. How can I stem each word and get back the stemmed sentence?
I need to remove certain stop words from the text string. The list containing the stop words is stored in a text file (space separated)
```
stopwordsfile = open('c:/stopwordlist.txt','r+')
stopwordslist=stopwordsfile.read()
```
How can I remove those stop words from text and get a cleaned new string?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T13:15:24+00:00

I posted this as a comment, but thought I might as well flesh it out into a full answer with some explanation:

You want to use str.split() to split the string into words, and then stem each word:

for word in text.split(" "):
    PorterStemmer().stem_word(word)

As you want to get a string of all the stemmed words together, it’s trivial to then join these stems back together. To do this easily and efficiently we use str.join() and a generator expression:

" ".join(PorterStemmer().stem_word(word) for word in text.split(" "))

Edit:

For your other problem:

with open("/path/to/file.txt") as f:
    words = set(f)

Here we open the file using the with statement (which is the best way to open files, as it handles closing them correctly, even on exceptions, and is more readable) and read the contents into a set. We use a set as we don’t care about the order of the words, or duplicates, and it will be more efficient later. I am presuming one word per line – if this isn’t the case, and they are comma separated, or whitespace separated then using str.split() as we did before (with appropriate arguments) is probably a good plan.

stems = (PorterStemmer().stem_word(word) for word in text.split(" "))
" ".join(stem for stem in stems if stem not in words)

Here we use the if clause of a generator expression to ignore words that are in the set of words we loaded from a file. Membership checks on a set are O(1), so this should be relatively efficient.

Edit 2:

To remove the words before they are stemmed, it’s even simpler:

" ".join(PorterStemmer().stem_word(word) for word in text.split(" ") if word not in words)

The removal of the given words is simply:

filtered_words = [word for word in unfiltered_words if not in set_of_words_to_filter]

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Assume I have a string text = A compiler translates code from a source

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply