Given two strings, I want to find all common substrings of a specified length, but allowing one character to be different.
For example, if s1 is 'ATCAGC', s2 is 'ATAATCGAC', and the specified length is 3, then I’d want output along these lines:
ATC from s1 matches ATA, ATC from s2
TCA from s1 matches TAA, TCG from s2
Questions
- Can I do so with a simple regex?
- If not, is there module for this in Perl?
First, google result for “perl hamming distance” found a perlmonks thread that mentions Text::LevenshteinXS, various typical implementations, and a cute xor trick :
You should skim wikipedia article on String metrics if Levenshtein distance or Hamming distance aren’t familiar.