I am trying to do this in Python.
I have two sequences:
seq1: ‘A B C D E’
seq2: ‘A R C B E’
Suppose I want to “count” the number of characters in both seq1 and seq2
but in the following way, suppose I draw a line from A in seq1 to A in seq2 and likewise connect C–C and E–E but if I connect B–B this connecting line will CROSS the line linking C–C so …
I want to count EITHER B—B OR C–C and NOT BOTH since their lines cross and find the total number of such connections I can make between two strings.
Is there a way to do this? I am sure what I am trying to do has a name but I don’t know it and that makes searching online about possible methods difficult too.
Thank you for the help.
Sound like the longest common subsequence problem. A simplified version of the dynamic programming algorithm for Levenshtein distance solves this.
There’s a host of Python implementations of LCS on the interwebs. The pseudocode that the Wikipedia gives is also trivial to translate to Python.