I have to grep from a file (5MB) containing specific strings the same strings

Question

0

Asked: June 18, 20262026-06-18T02:07:13+00:00 2026-06-18T02:07:13+00:00

I have to grep from a file (5MB) containing specific strings the same strings

0

I have to grep from a file (5MB) containing specific strings the same strings (and other information) from a big file (27GB).
To speed up the analysis I split the 27GB file into 1GB files and then applied the following script (with the help of some people here). However it is not very efficient (to produce a 180KB file it takes 30 hours!).

Here’s the script. Is there a more appropriate tool than grep? Or a more efficient way to use grep?

#!/bin/bash

NR_CPUS=4
count=0


for z in `echo {a..z}` ;
do
 for x in `echo {a..z}` ;
 do
  for y in `echo {a..z}` ;
  do
   for ids in $(cat input.sam|awk '{print $1}');  
   do 
    grep $ids sample_"$z""$x""$y"|awk '{print $1" "$10" "$11}' >> output.txt &
    let count+=1
                                [[ $((count%NR_CPUS)) -eq 0 ]] && wait
   done
  done #&

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T02:07:14+00:00

A few things you can try:

1) You are reading input.sam multiple times. It only needs to be read once before your first loop starts. Save the ids to a temporary file which will be read by grep.

2) Prefix your grep command with LC_ALL=C to use the C locale instead of UTF-8. This will speed up grep.

3) Use fgrep because you’re searching for a fixed string, not a regular expression.

4) Use -f to make grep read patterns from a file, rather than using a loop.

5) Don’t write to the output file from multiple processes as you may end up with lines interleaving and a corrupt file.

After making those changes, this is what your script would become:

awk '{print $1}' input.sam > idsFile.txt
for z in {a..z}
do
 for x in {a..z}
 do
  for y in {a..z}
  do
    LC_ALL=C fgrep -f idsFile.txt sample_"$z""$x""$y" | awk '{print $1,$10,$11}'
  done >> output.txt

Also, check out GNU Parallel which is designed to help you run jobs in parallel.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have to grep from a file (5MB) containing specific strings the same strings

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply