I’m working on a matching algorithm for protein sequences. I’m starting with an aligned

Question

0

Asked: June 5, 20262026-06-05T19:04:58+00:00 2026-06-05T19:04:58+00:00

I’m working on a matching algorithm for protein sequences. I’m starting with an aligned

0

I’m working on a matching algorithm for protein sequences. I’m starting with an aligned protein sequence, and I am attempting to convert a mis-aligned sequence into the correctly aligned one. Here is an example:

original aligned sequence: —-AB–C-D—–

mis-aligned sequence: –A–BC—D-

The expected output should look like this:

original aligned sequence: —-AB–C-D—–

mis-aligned sequence: —-AB–C-D—– (both are now the same)

I’m told to be very specific about my problem, but the sequences I’m trying to match are >4000 characters long, and look ridiculous when pasted here. I’ll post two sequences representative of my problem, though, and that should do.

seq="---A-A--AA---A--"
newseq="AA---A--A-----A-----"
seq=list(seq) #changing maaster sequence from string to list
newseq=list(newseq) #changing new sequence from string to list
n=len(seq) #obtaining length of master sequence
newseq.extend('.') #adding a tag to end of new sequence to account for terminal gaps

print(seq, newseq,n) #verification of sequences in list form and length

for i in range(n)
    if seq[i]!=newseq[i]:
        if seq[i] != '-': #gap deletion
            del newseq[i]

        elif newseq[i] != '-':
            newseq.insert(i,'-') #gap insertion


        elif newseq[i] == '-':
            del newseq[i]


old=''.join(seq) #changing list to string
new=''.join(newseq) #changing list to string
new=new.strip('.') #removing tag

print(old) #verification of master-sequence fidelity
print(new) #verification of matching sequence

The output I get from this particular code and set of sequences is:

—A-A–AA—A–

—A-A–A—-A—–A—–

I can’t seem to get the loop to correctly delete unwanted dashes in between the letters more than once, because the rest of the loop iterations are used in an add dash/delete dash pair.
This is a good start to the problems here.

How can I write this loop successfully to obtain my desired output (two identical sequences)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T19:05:00+00:00

I edited your code and it is giving correct output now:

seq="----AB--C-D-----"
newseq="--A--BC---D-"
seq=list(seq) #changing maaster sequence from string to list
newseq=list(newseq) #changing new sequence from string to list
n=len(seq) #obtaining length of master sequence
newseq.extend('.') #adding a tag to end of new sequence to account for terminal gaps

print(seq, newseq,n) #verification of sequences in list form and length
for i in range(len(seq)):
    if seq[i]!=newseq[i]:
       if seq[i]=='-':
           newseq.insert(i,'-')

       elif newseq[i]=='-':
           newseq.insert(i,seq[i])
       else:
           newseq.insert(i,seq[i])

else:
    newseq=newseq[0:len(seq)]

old=''.join(seq) #changing list to string
new=''.join(newseq) #changing list to string
new=new.strip('.') #removing tag

print(old) #verification of master-sequence fidelity
print(new) #verification of matching sequence

output:

----AB--C-D-----
----AB--C-D-----

and for AA---A--A-----A-----:

---A-A--AA---A--
---A-A--AA---A--

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m working on a matching algorithm for protein sequences. I’m starting with an aligned

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply