This currently prints all the nouns with sentences they are found in right below.
#!/usr/bin/perl
use strict;
use warnings FATAL => "all";
my $search_key = "expend"; ## CHANGE "..." to <>
open(my $tag_corpus, '<', "ch13tagged.txt") or die $!;
my @sentences = <$tag_corpus>; # This breaks up each line into list
my @words;
my %seens = ();
my %seenw = ();
for (my $i = 0; $i <= @sentences; $i++) {
if (defined($sentences[$i]) and $sentences[$i] =~ /($search_key)_VB.*/i) {
@words = split /\s/, $sentences[$i]; ## \s is a whitespace
for (my $j = 0; $j <= @words; $j++) {
#FILTER if word is noun, and therefore will end with _NN:
if (defined($words[$j]) and $words[$j] =~ /_NN/) {
#PRINT word (without _NN) and sentence (without any _ENDING):
next if $seenw{$words[$j]}++; ## How to include plural etc
push @words, $words[$j];
print "**", split(/_\S+/, $words[$j]), "**", "\n";
## next if $seens{ $sentences[$i] }++;
## push @sentences, $sentences[$i];
print split(/_\S+/, $sentences[$i]), "\n"
## HOW PRINT bold or specifically word bold?
#FILTER if word has been output, add sentence under that heading
}
} ## put print sentences here to print each sentence after all the nouns inside
}
}
close $tag_corpus || die "Can't close $tag_corpus: $!";
Your original:
That’s a good start…
Since you’re going to use this in a regex in a loop, it’s better to compile the
regex right now:
my $verb_regex = qr/\bexpend_VB\b/i. I put word boundaries inthere, because it seems like you need them. ‘
This does much of the same with less overhead:
Back to yours:
If the line contains the record separator–and it will unless you
chompit, you’ll always begetting a defined line until the end of the file. There’s no need to test for defined.
Additionally, you don’t need the
.*after the search term and capturing the$search_keyhere has no effect.
You don’t want to split on a single space for whitespace. You should use
/\s+/, buteven better is:
@words = split ' ', $sentences[$i];But you won’t even need that.
But that’s all you’re if-ing on: words ending in
_NN. In addition, the wholelist from a
splitwill be defined– no need to test.Unless you want to reset
%seenwafter each sentence, you’ll only process each_NNword once per file.
I don’t see how this
pushcan serve any possible purpose by appending nounsback on the list of words. Sure you’ve got the uniqueness check before it to save
you from the infinite loop if there are any
_NNwords, but it just means you’ll haveall the words in the sentence, followed by all the “nouns”. Not only that, but you’re simply
going to test that it’s an noun and do nothing with it. Not to mention that you
clobber the list with the next sentence.
You don’t want to do this in the word loop
Again, I’m not thinking that you would want to do this if it were uncommented
and outside the word loop. It seems like everything from 2 lines ago would be
after the word loop.
Nope. That won’t handle the bad return from close. The
||or is “binding” tootightly. You are closing either
$tag_corpusor the output of die. Luckily (or perhaps unluckily)the die never gets called because if we got this far,
$tag_corpusshould be atrue value.
This is a kind of cleaned-up version of what you’re trying to do–with the
parts that I can make sense of left in.