I am reading a postfix mail log file into an array and then looping through it to extract messages. On the first pass, I’m checking for a match on the “to=” line and grabbing the message ID. After building an array of MSGIDs, I’m looping back through the array to extract information on the to=, from=, and client= lines.
What I’d like to do is remove a line from the array as soon as I’ve extracted the data from it in order to make the processing a bit faster (i.e. one less line to check against).
Any suggestions? This is in Perl.
Edit: gbacon’s answer below was enough to get me rolling with a solid solution. Here’s the guts of it:
my %msg;
while (<>) {
my $line = $_;
if (s!^.*postfix/\w+\[.+?\]: (\w+):\s*!!) {
my $key = $1;
push @{ $msg{$key}{$1} } => $2
while /\b(to|from|client|size|nrcpt)=<?(.+?)(?:>|,|\[|$)/g;
}
if ($line =~ s!^(\w+ \d+ \d+:\d+:\d+)\s(\w+.*)\s+postfix/\w+\[.+?\]: (\w+):\s*removed!!) {
my $key = $3;
push @{ $msg{$key}{date} } => $1;
push @{ $msg{$key}{server} } => $2;
}
}
use Data::Dumper;
$Data::Dumper::Indent = 1;
print Dumper \%msg;
I’m sure that second regexp can be made more impressive, but it gets the job done for what I need. I can now take the hash of all messages and pull out the ones I’m interested in.
Thanks to all who answered.
Do it in a single pass:
The code works by first looking for a queue ID (e.g.,
BA1CE38965and62D8438973above), which we store in$key.Next, we find all matches on the current line (thanks to the
/gswitch) that look liketo=<...>,client=mail.example.com, and so on—with and without the separating comma.Of note in the pattern are
\b– matches on a word boundary only (prevents matchingxxxto=<...>)(to|from|client)– matchtoorfromorclient(.+?)– matches the field’s value with a non-greedy quantifier(?:,|$)– matches either a comma or at end of string without capturing into$3The non-greedy
(.+?)forces the match to stop at the first comma it encounters rather than the last. Otherwise, on a line withyou’d get
<foo@example.com>, other=123as the recipient!Then for each field matched, we
pushit onto the end of an array (because there may be multiple recipients, for example) connected to both the queue ID and field name. Take a look at the result:$VAR1 = { '62D8438973' => { 'client' => [ 'localhost.localdomain[127.0.0.1]' ], 'to' => [ '<us...@test.com>', '<us...@test.com>' ], 'from' => [ '<mailt...@example.com>' ] }, 'BA1CE38965' => { 'client' => [ 'mail.example.com[x.x.x.x]' ], 'to' => [ '<us...@test.com>', '<us...@test.com>' ], 'from' => [ '<mailt...@example.com>' ] } };Now say you want to print all the recipients of the message whose queue ID is
BA1CE38965:Maybe you want to know only how many recipients:
If you’re willing to assume each message has exactly one client, access it with