I’ve got about 400’000 files that need some text to be replaced. I tried

Question

0

Editorial Team

Asked: June 15, 20262026-06-15T03:59:07+00:00 2026-06-15T03:59:07+00:00

I’ve got about 400’000 files that need some text to be replaced. I tried

0

I’ve got about 400’000 files that need some text to be replaced.

I tried the following Perl script:

@files = <*.html>;

foreach $file (@files) {
    `perl -0777 -i -pe 's{<div[^>]+?id="user-info"[^>]*>.*?</div>}{}gsmi;' $file`;

    `perl -0777 -i -pe 's{<div[^>]+?class="generic"[^>]*>[^\s]*<small>[^\s]*Author.*?</div>.*?</div>.*?</div>.*?</div>.*?</div>}{}gsmi;' $file`;

    `perl -0777 -i -pe 's{<script[^>]+?src="javascript.*?"[^>]*>.*?</script>}{}gsmi;' $file`;

    `perl -p -i -e 's/.css.html/.css/g;' $file`;
}

I don’t have a deep Perl knowledge, but the script runs too slow (updates only about 180 files per day).

Is there a way to speed it up?

Thank you in advance!

PS: When I tested it on a smaller number of files, I’ve noticed a much better performance…

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T03:59:08+00:00

First off, if you load 400,000 file names into memory, that’s going to suck up some memory. You can easily just iterate through the file list by for example:

File::Find
opendir + while (readdir($dh)) (does not load the entire list)

Second, using backticks spawns a new process in the shell, and it is very ineffective. You could just open the files normally, slurp them, and then reprint to the same file name. E.g.

while (my $file = readdir($dh)) {
    open my $fh, "<", $file or die $!;
    local $/;
    my $text = <$fh>;                # slurp file
    $text =~ s/....//g;              # do your substitutions
    open $fh, ">", $file or die $!;
    print $fh $text;                 # overwrite file, same as -i switch does
}

Lastly.. using regexes to edit html is not ideal. It might work for your case, but it might be worthwhile to invest some time learning an html parser. Not sure how suitable it would be for this particular case, but it might be worth looking into, to make your code more stable.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve got about 400’000 files that need some text to be replaced. I tried

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply