Question about code performance: I’m trying to run ~25 regex rules against a ~20g

Question

0

Asked: June 8, 20262026-06-08T16:21:15+00:00 2026-06-08T16:21:15+00:00

Question about code performance: I’m trying to run ~25 regex rules against a ~20g

0

Question about code performance: I’m trying to run ~25 regex rules against a ~20g text file. The script should output matches to text files; each regex rule generates its own file. See the pseudocode below:

regex_rules=~/Documents/rulesfiles/regexrulefile.txt
for tmp in *.unique20gbfile.suffix; do
    while read line
    # Each $line in the looped-through file contains a regex rule, e.g.,
    # egrep -i '(^| )justin ?bieber|(^| )selena ?gomez'
    # $rname is a unique rule name generated by a separate bash function
    # exported to the current shell.
        do
        cmd="$line $tmp > ~/outputdir/$tmp.$rname.filter.piped &"
        eval $cmd
    done < $regex_rules
done

Couple thoughts:

Is there a way to loop the text file just once, evaluating all rules and splitting to individual files in one go? Would this be faster?
Is there a different tool I should be using for this job?

Thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T16:21:17+00:00

Editorial Team

2026-06-08T16:21:17+00:00Added an answer on June 8, 2026 at 4:21 pm

This is the reason grep has a -f option. Reduce your regexrulefile.txt to just the regexps, one per line, and run

egrep -f regexrulefile.txt the_big_file

This produces all the matches in a single output stream, but you can do your loop thing on it afterward to separate them out. Assuming the combined list of matches isn’t huge, this will be a performance win.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Question about code performance: I’m trying to run ~25 regex rules against a ~20g

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply