I am using the following example from Lingua::StopWords:
use Lingua::StopWords qw( getStopWords );
my $stopwords = getStopWords('en');
my @words = qw( i am the walrus goo goo g'joob );
# prints "walrus goo goo g'joob"
print join ' ', grep { !$stopwords->{$_} } @words;
How do I get it to use my $document, remove stopwords and print the results to a file? See my code here:
open(FILESOURCE, "sample.txt") or die("Unable to open requested file.");
my $document = <FILESOURCE>;
close (FILESOURCE);
open(TEST, "results_stopwords.txt") or die("Unable to open requested file.");
use Lingua::StopWords qw( getStopWords );
my $stopwords = getStopWords('en');
print join ' ', grep { !$stopwords->{$_} } $document;
I tried these variations:
print join ' ', grep { !$stopwords->{$_} } TEST;
print TEST join ' ', grep { !$stopwords->{$_} } @words;
Basically, how do I read in a document, remove the stop words and then write the result to a new file?
In your program, you forgot to tokenise the input text into words. A simplistic alternative to
Lingua::EN::Splitter::wordsis tosplita line on spaces into a list of words (approximately).Taking tchrist‘s comment in account, this program is fit to be a Unix filter.