I am using Perl stat() function to get the size of directory and its subdirectories. I have a list of about 20 parent directories which have few thousand recursive subdirs and every subdir has few hundred records.
Main computing part of script looks like this:
sub getDirSize {
my $dirSize = 0;
my @dirContent = <*>;
my $sizeOfFilesInDir = 0;
foreach my $dirContent (@dirContent) {
if (-f $dirContent) {
my $size = (stat($dirContent))[7];
$dirSize += $size;
} elsif (-d $dirContent) {
$dirSize += getDirSize($dirContent);
}
}
return $dirSize;
}
The script is executing for more than one hour and I want to make it faster.
I was trying with the shell du command, but the output of du (transfered to bytes) is not accurate. And it is also quite time consuming.
I am working on HP-UNIX 11i v1.
I once faced a similar problem, and used a parallelization approach to speed it up. Since you have ~20 top-tier directories, this might be a pretty straightforward approach for you to try.
Split your top-tier directories into several groups (how many groups is best is an empirical question), call
fork()a few times and analyze directory sizes in the child processes. At the end of the child processes, write out your results to some temporary files. When all the children are done, read the results out of the files and process them.