I have a working perl script that scans a directory and uses imgsize
http://dktools.sourceforge.net/imgsize.html to get the width, etc of png files. Does anyone have any tips for speeding up this process (right now, it averages 5 minutes for every 1000 files)? I was just wondering if the code could be optimized some how. Thanks.
use strict;
use warnings;
use File::Find;
my @files;
my $directory = '/Graphics/';
my $output_file = '/output_file';
my $max_height = 555;
my $count = 0;
open ( OUTPUT, '>>', $output_file );
find( \&wanted, $directory );
foreach my $file ( @files ) {
if ( $file =~ /\.png$/ ) {
my $height = `imgsize $file | cut -d\'\"\' -f4`;
if ( $height > $max_height ) {
print OUTPUT "$file\n";
}
$count++;
my $int_check = $count/1000;
if ( $int_check !~ /\D/ ) {
print "processed: $count\n";
}
}
}
print "total: $count\n";
close ( OUTPUT );
exit;
sub wanted {
push @files, $File::Find::name;
return;
}
Solution: Turns out that I was able to use the Image::Info module. I went from processing 1000 imgs every 5 minutes to every 12 seconds. Here’s the relevant snippet of code, if anyone is interested:
use Image::Info qw(image_info);
foreach my $file ( @files ) {
if ( $file =~ /\.png$/ ) {
my $output = image_info($file);
my $height = ${$output}{height};
if ($height > $max_height) {
print OUTPUT "$file\n";
}
$count++;
my $int_check = $count/1000;
if ( $int_check !~ /\D/ ) {
print "processed: $count\n";
}
}
}
The Perl code you’ve shown is likely not the culprit. You can profile it with Devel::NYTProf, just like @choroba has said. But I’d bet money that most of the time comes from forking two external processes per image (
imgsizeandcut). You should look into Perl modules that can retrieve the image’s height without running any external process. Modules like Image::Info come to mind.