Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9188231
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 17, 20262026-06-17T19:57:24+00:00 2026-06-17T19:57:24+00:00

I wrote a program to extract attachments from mail folders ( GITHUB ) but

  • 0

I wrote a program to extract attachments from mail folders (
GITHUB)
but it fails because of Perl’s 32767 line limit on regex matching. My
program loads each mail message as a single string, and then tries to
match each base64-encoded file as a single string.

To replicate the problem, first do this:

(dd if=/dev/urandom bs=2000 count=1000 | base64 ; echo "\n\n\n" ; dd if=/dev/urandom bs=2000 count=1000 | base64 ) >! /tmp/testfile.txt 

This creates a single 5403516 byte file that contains the
base64-encoding of two files with a triple newline buffer between
them. The situation in production is a little more complex, but this
simpler case demonstrates the problem.

Our goal is to extract the base64-encoding of the first file. In other
words, all consecutive lines that are 50 characters or longer and
contain only base64 characters, but stopping when we see the first “=”
sign (which indicates end-of-file in base64).

/tmp/testfile.txt has 70180 lines, with the first 35088 lines
representing the string we want to capture (the base64-encode of the
first file).

We now do the following in Perl:

# next 4 lines: read the entire file into a single variable 
undef $/; 
open(A,"/tmp/testfile.txt"); 
$all = <A>; 
close(A); 

# the output of base64 consists of these characters (plus "=" and 
# "\n", but those two are special cases) 
my($chars) = "[a-zA-Z0-9\+\/]"; 

# we declare a subroutine for testing 
sub foo {print STDERR length($_[0]),"\n";} 

# this is what I tried to do originally 
$all=~s/(\n($chars{50,}\=*\n)+)($chars+\=*\n)/foo("$1$3")/seg; 

The above yields “2523137” then “178467” then “2523137” then “178544”
to the STDERR.

In other words, it captures the first 2523137 characters of the first
file, then the next 178467 characters of the first file, instead of
capturing all 2701604 characters of the first file like I want. Note
that 2523137 is approximately 77*32767 (and each line of
/tmp/testfile.txt is 77 characters long).

@ikegami, if I understand correctly, your approach is:

$all=~s/((\n($chars{50,}\=*\n){0,20000})+)($chars+\=*\n)//seg; 

In other words, capture 20000 lines at a time (avoiding the 32767 line
limit), but capture multiple bunches of 20000 lines. Is this correct?

Since the results will come out in multiple variables, I didn’t pass
the result to foo(), but instead printed the results to STDERR like
this:

print STDERR "1 is $1\n"; 
print STDERR "2 is $2\n"; 
print STDERR "3 is $3\n"; 
print STDERR "4 is $4\n"; 
print STDERR "5 is $5\n"; 
print STDERR "6 is $6\n"; 

This yields $1 and $2 as identical 15085 line variables, $3 and $4 as
non-identical one line variables, and $5 and $6 as empty.

Thus, I think I misunderstood your approach. Help?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-17T19:57:25+00:00Added an answer on June 17, 2026 at 7:57 pm

    Since you can split your base64 pieces by a static string, you can use $/ to split up the file much more efficiently and then choose whether each piece matches your criterion.

    use strict;
    use warnings;
    use autodie;
    
    my $is_base64 = qr{^[a-zA-Z0-9\+\/]+\n?$}m;
    
    {
        open(my $fh,"/tmp/testfile.txt");
        local $/ = "=\n";
    
        while(my $base64 = <$fh>) {
            chomp $base64;
            _strip(\$base64);
            next unless $base64 =~ $is_base64;
    
            print STDERR length $base64, "\n";
        }
    }
    
    sub _strip {
        my $ref = shift;
        $$ref =~ s{^\s+}{};
        $$ref =~ s{\s+$}{};
    
        return;
    }
    

    This is also handy for splitting up mailboxes, set $/ to "\n\nFrom ".

    But the comments suggesting that you should be doing this with a module are correct. There’s a lot of mail modules on CPAN so it can be a bit difficult to find the right one.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I wrote a program for retrieving mail from POP3 servers. One of its users
I wrote a program for downloading an image from web using AsyncTask in service
I wrote a program called Hello.py that looks like this: import pygame, sys from
I'm writing this small program to extract any number of email address from a
I wrote a Delphi program that extracts and consolidates data from several different spreadsheets
I wrote this program to copy one pdf file to other but I'm getting
Just wrote my first python program! I get zip files as attachment in mail
I'm working on a program that extract information from a large chuck of text
I have a small issue with a python program that I wrote to extract
i currently want to write a program which can extract audio from an FLV

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.