So I’m running perl 5.10 on a core 2 duo macbook pro compiled with threading support: usethreads=define, useithreads=define. I’ve got a simple script to read 4 gzipped files containing aroud 750000 lines each. I’m using Compress::Zlib to do the uncompressing and reading of the files. I’ve got 2 implementations the only difference between them being one includes use threads. Other than that both script run the same subroutine to do the reading. Hence in psuedocode the non-threading program does this:
read_gzipped(file1);
read_gzipped(file2);
read_gzipped(file3);
read_gzipped(file4);
The threaded version goes like this:
my thr0 = threads->new(\$read_gzipped,'file1')
my thr1 = threads->new(\$read_gzipped,'file1')
my thr2 = threads->new(\$read_gzipped,'file1')
my thr3 = threads->new(\$read_gzipped,'file1')
thr0->join()
thr1->join()
thr2->join()
thr3->join()
Now the threaded version is actually running almost 2 times slower then the non-threaded script. This obviously was not the result I was hoping for. Can anyone explain what I’m doing wrong here?
My guess is the bottleneck for GZIP operations is disk access. If you have four threads competing for disk access on platter harddisk, that slows things down considerably. The disk head will have to move to different files in rapid succession. If you just process one file at a time, the head can stay near that file, and the disk cache will be more accurate.