I have a a large text file (over 70mb) and need to count the number of times a character sequence occurs in the file. I can find plenty of scripts to do this, but NONE OF THEM take in to account that a sequence can start and finish on different lines. For the sake of efficiency (I actually have way more than 1 file I am processing), I can not preprocess the files to remove newlines.
Example:
If I am searching for “thisIsTheSequence”, the following file would have 3 matches:
asdasdthisIsTheSequence
asdasdasthisIsT
heSequenceasdasdthisIsTheSequ
encesadasdasda
Thanks for the help.
just one awk script will do, since you will processing a huge file. Doing multiple pipes can slow down things.
output