I have a a large text file (over 70mb) and need to count the

Question

0

Asked: May 12, 20262026-05-12T19:40:50+00:00 2026-05-12T19:40:50+00:00

I have a a large text file (over 70mb) and need to count the

0

I have a a large text file (over 70mb) and need to count the number of times a character sequence occurs in the file. I can find plenty of scripts to do this, but NONE OF THEM take in to account that a sequence can start and finish on different lines. For the sake of efficiency (I actually have way more than 1 file I am processing), I can not preprocess the files to remove newlines.

Example:
If I am searching for “thisIsTheSequence”, the following file would have 3 matches:

asdasdthisIsTheSequence
asdasdasthisIsT
heSequenceasdasdthisIsTheSequ
encesadasdasda

Thanks for the help.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-12T19:40:50+00:00

just one awk script will do, since you will processing a huge file. Doing multiple pipes can slow down things.

#!/bin/bash
awk 'BEGIN{
 search="thisIsTheSequence"
 total=0
}
NR%10==0{
  c=gsub(search,"",s)
  total+=c  
}
NR{ s=s $0 }
END{ 
 c=gsub(search,"",s)
 print "total count: "total+c
}' file

output

$ more file
asdasdthisIsTheSequence
asdasdasthisIsT
heSequenceasdasdthisIsTheSequ
encesadasdasdaasdasdthisIsTheSequence
asdasdasthisIsT
heSequenceasdasdthisIsTheSequ
encesadasdasda
asdasdthisIsTheSequence
asdasdasthisIsT
heSequenceasdasdthisIsTheSequ
encesadasdasda

$ ./shell.sh
total count: 9

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a a large text file (over 70mb) and need to count the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply