I’ve got a long text (about 5 MB filesize) and another text called pattern (around 2000 characters).
The task is to find matching parts from a genom-pattern which are 15 characters or longer in the long text.
example:
long text:
ACGTACGTGTCA
AAAACCCCGGGGTTTTA
GTACCCGTAGGCGTAT AND MUCH LONGER
pattern:
ACGGTATTGAC
AAAACCCCGGGGTTTTA
TGTTCCCAG
I’m looking for an efficient (and easy to understand and implement) algorithm.
A bonus would be a way to implement this with just char-arrays in C++ if thats possible at all.
Stand back, I’m gonna live-code: