Is there a reliable way to detect blank pages with a perl script? I tried to do it with to following script by using the getPageText method. If I do it like that, pages which only contain graphics without text are also recognized as blank pages.
#!/usr/bin/perl -w
use CAM::PDF;
my $filename=$ARGV[0];
my $doc = CAM::PDF->new($filename) || die "$CAM::PDF::errstr\n";
my $pages = $doc->numPages();
print $pages;
$content=$doc->getPageText(1);
print "length".length($content);
if(length($content)==0)
{
print "File is empty";
}
foreach my $p ( 1 .. $doc->numPages() ) {
my $str = $doc->getPageText($p);
$str =~ m/[[:alnum:]]+/ms ); # actually returned text
print "Result text:".qq($str);
}
Is there another approach to find blank pages?
Sorry, there is no way to reliable detect blank pages.
However, I did this in the past:
I used pdftk to burst the pdf into one page pdf document.
If one of the pdfs size is very low, it does not contain images.
If pdftotext returns empty string it does not contain text.
Use pdftk to assemble all good pdfs into one.
I hope it will will helps you.