Need some help with regex matching please. I’m trying to match a double quoted string of text, within a large string, that itself can contain pairs of double quotes! Here’s an example:
"Please can ""you"" match this"
A fuller example of my problem and where I’ve got so far is shown below. The code below only stores ‘paris’ correctly in the hash, both london and melbourne are incorrect due to the double quote pair terminating the long description early.
Any help much appreciated.
use strict;
use warnings;
use Data::Dumper;
my %hash;
my $delimiter = '/begin CITY';
local $/ = $delimiter;
my $top_of_file = <DATA>;
my $records=0;
while(<DATA>) {
my ($section_body) = m{^(.+)/end CITY}ms;
$section_body =~ s{/\*.*?\*/}{}gs; # Remove any comments in string
$section_body =~ m{ ^\s+(.+?) ## Variable name is never whitespace seperated
## Always underscored. Akin to C variable names
\s+(".*?") ## The long description can itself contain
## pairs of double quotes ""like this""
\s+(.+) ## Everything from here can be split on
## whitespace
\s+$
}msx;
$hash{$records}{name} = $1;
$hash{$records}{description} = $2;
my (@data) = split ' ', $3;
@{ $hash{$records} }{qw/ size currency /} = @data;
++$records;
}
print Dumper(\%hash);
__DATA__
Some header information
/begin CITY
london /* city name */
"This is a ""difficult"" string to regex"
big
Sterling
/end CITY
/begin CITY paris
"This is a simple comment to grab."
big
euro /* the address */
/end CITY
/begin CITY
Melbourne
"Another ""hard"" long description to 'match'."
big
Dollar
/end CITY
Change this:
to this:
Also, your use of non-greedy matching isn’t very safe. Something like this:
could well end up including whitespace inside the variable-name, if Perl finds that that’s the only way to match. (It will prefer to stop before including whitespace, but it makes no guarantees.)
And you should always check to make sure that
m{}found something. If you’re sure that it will always match, then you can just tack on anor dieto validate that.