I have a text and I write a parser for it using regular expressions and perl.
I can match what I need with two empty lines (I use regexp), because there is a pattern that allows recognize blocks of text after two empty lines.
But the problem is that the whole text has Introduction part and some text in the end I do not need.
Here is a code which matches text when it finds two empty lines
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'first';
open(my $fh, '<', $file);
my $empty = 0;
my $block_num = 1;
open(OUT, '>', $block_num . '.txt');
while (my $line = <$fh>) {
chomp ($line);
if ($line =~ /^\s*$/) {
$empty++;
} elsif ($empty == 2) {
close(OUT);
open(OUT, '>', ++$block_num . '.txt');
$empty = 0;
}
else {
$empty = 0;}
print OUT "$line\n";
}
close(OUT);
This is example of the text I need (it’s really small :))
I think that I need to iterate over the text till the moment it will find the word LOREM IPSUM with regexps this kind “/^LOREM IPSUM/”, because it is the point from which needed text starts(and save the text in one file when i reach the word).
And I need to finish iterating over the text when INDEX word is fount or save the text in separate file.
How could I implement it. Should I use next function to proceed with lines or what?
BR,
Yuliya
You’d change your
whileloop to something likeThis will skip header lines until you hit the line that starts with
LOREM IPSUM, after which you will process lines.You’d use a similar pattern for ignoring all lines after a given line match, except you wouldn’t have to process any more lines, so instead of using
nextyou’d uselast. That pattern is left as an exercise to the reader. 🙂