I have a text and I write a parser for it using regular expressions

Question

0

Asked: May 19, 20262026-05-19T04:21:16+00:00 2026-05-19T04:21:16+00:00

I have a text and I write a parser for it using regular expressions

0

I have a text and I write a parser for it using regular expressions and perl.

I can match what I need with two empty lines (I use regexp), because there is a pattern that allows recognize blocks of text after two empty lines.

But the problem is that the whole text has Introduction part and some text in the end I do not need.

Here is a code which matches text when it finds two empty lines

#!/usr/bin/perl

use strict;
use warnings;

my $file = 'first';                    
open(my $fh, '<', $file);   
my $empty = 0;    
my $block_num = 1;    
open(OUT, '>', $block_num . '.txt');    

while (my $line = <$fh>) {  

 chomp ($line);
 if ($line =~ /^\s*$/) {  
  $empty++;      
  } elsif ($empty == 2) {     
   close(OUT);    
   open(OUT, '>', ++$block_num . '.txt');
   $empty = 0;
  } 
  else {
   $empty = 0;}
 print OUT "$line\n";

}
close(OUT);

This is example of the text I need (it’s really small :))

this is file example

I think that I need to iterate over the text till the moment it will find the word LOREM IPSUM with regexps this kind “/^LOREM IPSUM/”, because it is the point from which needed text starts(and save the text in one file when i reach the word).
And I need to finish iterating over the text when INDEX word is fount or save the text in separate file.

How could I implement it. Should I use next function to proceed with lines or what?

BR,
Yuliya

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-19T04:21:16+00:00

You’d change your while loop to something like

my $in_lorem = 0;
while (my $line = <$fh>) {
  if( $line =~ /^LOREM IPSUM/ ) {
    $in_lorem = 1;
    next;
  }
  next unless $in_lorem;
  # your processing goes here
}

This will skip header lines until you hit the line that starts with LOREM IPSUM, after which you will process lines.

You’d use a similar pattern for ignoring all lines after a given line match, except you wouldn’t have to process any more lines, so instead of using next you’d use last. That pattern is left as an exercise to the reader. 🙂

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a text and I write a parser for it using regular expressions

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply