Here’s a script that launchs 10 processes, each writing 100,000 lines to its STDOUT, which is inherited from the parent:
#!/usr/bin/env perl
# buffer.pl
use 5.10.0;
use strict;
use warnings FATAL => "all";
use autodie;
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new(4);
$|=1; # don't think this does anything with syswrite...
# start 10 jobs which write 100,000 lines each
for (1 .. 10 ) {
$pm->start and next;
for my $j (1 .. 100_000) {
syswrite(\*STDOUT,"$j\n");
}
$pm->finish;
}
$pm->wait_all_children;
If I pipe to another process, all is well..
$ perl buffering.pl | wc -l
1000000
But if I pipe to disk, the syswrites clobber each other.
$ perl buffering.pl > tmp.txt ; wc -l tmp.txt
457584 tmp.txt
What’s more, if I open write-file handles in the child processes and write directly to tmp.txt:
#!/usr/bin/env perl
# buffering2.pl
use 5.10.0;
use strict;
use warnings FATAL => "all";
use autodie;
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new(4);
$|=1;
for (1 .. 10) {
$pm->start and next;
open my $fh, '>', 'tmp.txt';
for my $j (1 .. 100_000) {
syswrite($fh,"$j\n");
}
close $fh;
$pm->finish;
}
$pm->wait_all_children;
tmp.txt has 1,000,000 lines as expected.
$ perl buffering2.pl; wc -l tmp.txt
100000 tmp.txt
So redirection via ‘>’ to disk has some sort of buffering but redirection to a process doesn’t? What’s the deal?
When you redirect the whole perl script you get one file descriptor (created by the shell when you do
> tmp.txtand inherited asstdoutby perl) which isdup‘d to each child. When you explicitlyopenin each child you get different file descriptors (notdups of the original). You should be able to replicate the shell redirection case if you hoistopen my $fh, '>', 'tmp.txt'out of your loop.The pipe case works because you’re talking to a pipe and not a file and it has no notion of offset which can be inadvertently shared in the kernel as I described above.