This currently prints all the nouns with sentences they are found in right below.

Question

0

Asked: May 22, 20262026-05-22T00:32:23+00:00 2026-05-22T00:32:23+00:00

This currently prints all the nouns with sentences they are found in right below.

0

This currently prints all the nouns with sentences they are found in right below.

#!/usr/bin/perl
use strict;
use warnings FATAL => "all";
my $search_key = "expend";    ## CHANGE "..." to <>

open(my $tag_corpus, '<', "ch13tagged.txt") or die $!;

my @sentences = <$tag_corpus>;    # This breaks up each line into list
my @words;
my %seens = ();
my %seenw = ();

for (my $i = 0; $i <= @sentences; $i++) {
    if (defined($sentences[$i]) and $sentences[$i] =~ /($search_key)_VB.*/i) {
        @words = split /\s/, $sentences[$i];    ## \s is a whitespace
        for (my $j = 0; $j <= @words; $j++) {
            #FILTER if word is noun, and therefore will end with _NN:
            if (defined($words[$j]) and $words[$j] =~ /_NN/) {
                #PRINT word (without _NN) and sentence (without any _ENDING):
                next if $seenw{$words[$j]}++;    ## How to include plural etc
                push @words, $words[$j];
                print "**", split(/_\S+/, $words[$j]), "**", "\n";
                ## next if $seens{ $sentences[$i] }++;
                ## push @sentences, $sentences[$i];
                print split(/_\S+/, $sentences[$i]), "\n"
                ## HOW PRINT bold or specifically word bold?
                #FILTER if word has been output, add sentence under that heading
            }
        }    ## put print sentences here to print each sentence after all the nouns inside
    }
}
close $tag_corpus || die "Can't close $tag_corpus: $!";

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-22T00:32:24+00:00

Your original:

#!/usr/bin/perl
use strict;
use warnings FATAL => "all";

That’s a good start…

my $search_key = "expend";    ## CHANGE "..." to <>

Since you’re going to use this in a regex in a loop, it’s better to compile the
regex right now: my $verb_regex = qr/\bexpend_VB\b/i. I put word boundaries in
there, because it seems like you need them. ‘

open(my $tag_corpus, '<', "ch13tagged.txt") or die $!;

my @sentences = <$tag_corpus>;    # This breaks up each line into list
my @words;
my %seens = ();
my %seenw = ();

for (my $i = 0; $i <= @sentences; $i++) {

This does much of the same with less overhead:

while ( <$tag_corpus> ) { 
    ...

Back to yours:

    if (defined($sentences[$i]) and $sentences[$i] =~ /($search_key)_VB.*/i) {

If the line contains the record separator–and it will unless you chomp it, you’ll always be
getting a defined line until the end of the file. There’s no need to test for defined.

Additionally, you don’t need the .* after the search term and capturing the $search_key
here has no effect.

        @words = split /\s/, $sentences[$i];    ## \s is a whitespace

You don’t want to split on a single space for whitespace. You should use /\s+/, but
even better is: @words = split ' ', $sentences[$i];

But you won’t even need that.

        for (my $j = 0; $j <= @words; $j++) {
            #FILTER if word is noun, and therefore will end with _NN:
            if (defined($words[$j]) and $words[$j] =~ /_NN/) {
                #PRINT word (without _NN) and sentence (without any _ENDING):

But that’s all you’re if-ing on: words ending in _NN. In addition, the whole
list from a split will be defined– no need to test.

                next if $seenw{$words[$j]}++;    ## How to include plural etc

Unless you want to reset %seenw after each sentence, you’ll only process each _NN
word once per file.

                push @words, $words[$j];

I don’t see how this push can serve any possible purpose by appending nouns
back on the list of words. Sure you’ve got the uniqueness check before it to save
you from the infinite loop if there are any _NN words, but it just means you’ll have
all the words in the sentence, followed by all the “nouns”. Not only that, but you’re simply
going to test that it’s an noun and do nothing with it. Not to mention that you
clobber the list with the next sentence.

                print "**", split(/_\S+/, $words[$j]), "**", "\n";

                ## next if $seens{ $sentences[$i] }++;

You don’t want to do this in the word loop

                ## push @sentences, $sentences[$i];

Again, I’m not thinking that you would want to do this if it were uncommented
and outside the word loop. It seems like everything from 2 lines ago would be
after the word loop.

                print split(/_\S+/, $sentences[$i]), "\n"
                ## HOW PRINT bold or specifically word bold?
                #FILTER if word has been output, add sentence under that heading
            }
        }    ## put print sentences here to print each sentence after all the nouns inside
    }
}
close $tag_corpus || die "Can't close $tag_corpus: $!";

Nope. That won’t handle the bad return from close. The || or is “binding” too
tightly. You are closing either $tag_corpus or the output of die. Luckily (or perhaps unluckily)
the die never gets called because if we got this far, $tag_corpus should be a
true value.

This is a kind of cleaned-up version of what you’re trying to do–with the
parts that I can make sense of left in.

my @sentences;
# We're processing a single line at a time.
while ( <$tag_corpus> ) { 
    # Test if we want to work with the line
    next unless m/$verb_regex/;
    # If we do, then test that we haven't dealt with it before
    # Although I suspect that this may not be needed as much if we're not 
    # pushing to a queue that we're reading from.
    next if    $seens{ $_ }++;

    # split -> split ' ', $_
    # pass through only those words that match _NN at the end and
    # are unique so far. We test on a substitution, because the result
    # still uniquely identifies a noun
    foreach my $noun ( grep { s/_NN$// && !$seenw{ $_ }++ } split ) { 
        print "**$noun**\n";
    }
    # This will omit any adjacent punctuation you have after the word--if 
    # that's a problem.
    print split( /_\S+/ ), "\n";
    # Here we save the sentence.
    push @sentences, $_;
}
close $tag_corpus or die "Can't close ch13tagged.txt: $!";

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

This currently prints all the nouns with sentences they are found in right below.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply