I have text files which I need to remove stop words from them. I

Question

0

Asked: May 20, 20262026-05-20T12:59:50+00:00 2026-05-20T12:59:50+00:00

I have text files which I need to remove stop words from them. I

0

I have text files which I need to remove stop words from them. I have the stop words stored in a text file. I load the “stop-word” text file into my Perl script and store the stop words in an array called “stops”.

Currently I am loading a different set of text files and I am storing them in a separate array then doing a pattern match to see if any of the words are indeed stop words.
I can print the stop words and know which ones are occurring in the files but how do I remove them from the text file and store a new text file so it has no stop words?

i.e Stopwords:
the
a
to
of
and
into

Text File:
“The girl was driving and crashed into a man”

Resulting file:
girl was driving crashed man

I load the file in:

$dirtoget = "/Users/j/temp/";
opendir( IMD, $dirtoget ) || die("Cannot open directory");`
@thefiles = readdir(IMD);`

foreach $f (@thefiles) {
if ( $f =~ m/\.txt$/ ) {

    open( FILE, "/Users/j/temp/$f" ) or die "Cannot open FILE";

    while (<FILE>) {
        @file = <FILE>;

Here is the pattern matching loop:

  foreach $word(split) {
                foreach $x (@stop) {
                   if  ($x =~ m/\b\Q$word\E\b/) {
                 $word='';
                        print $word,"\n";

Setting $word to be null.

Or I could do:

    $word = '' if exists $stops{$word};

I’m just not sure how I set output file to no longer contain the matching words.
Is it stupid to store the words which don’t match in an array and output them to a file?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T12:59:51+00:00

Editorial Team

2026-05-20T12:59:51+00:00Added an answer on May 20, 2026 at 12:59 pm

Overwriting the files in-place is possible, but a hassle. The Unix way of doing this is to just output the non-stopwords to standard output (which print does by default), redirect that

./remove_stopwords.pl textfile.txt > withoutstopwords.txt

then proceed with the file withoutstopwords.txt. This also allows the use of the program in a pipeline.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have text files which I need to remove stop words from them. I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply