Using the Smith-Waterman algorithm for a book homework assignment, I made up a table of values. Building the table was easy once I understood how the values are acquired, but now I’m having difficulty with determining the best alignment sequence from the table.
The table example was generated following the formula
min( (i+1, j+1)+penalty)
(i+1, j)+2)
(i, j+1)+2))
In the book pseudocode, penalty had a value of 0 if i==j and 1 otherwise.
The first 4 rows and columns look like this, with a penalty of 1 for a mismatch and 2 for a gap. :
14 12 10 8
15 13 11 9
16 14 12 10
17 15 13 11
According to the directions in the book, the method for determining the path are
- Start at array slot [0][0], in this case the value is 14
- Check slot [0][1]. As we move left to the slot, a gap is inserted, thus adding 2 to the value, resulting in 14
- Check slot [1][0], and another gap is inserted resulting in a value of 17
- Check slot [1][1]. As we move diagonally, the penalty value is added to to the slot value, giving a result of 14
Since I have two matching possibilities in [0][1] and [1][1], which is to be used for the next step?
It seems to me that you already applied the weights when constructed the matrix, at least, this is how the standard variation of SW works. To determine the path, you are likely just following the path to the lower number, not adding the penalty any more. I may be wrong; please post a link to the book page in Google Books, or other description matching your book if in doubt. It seems to me that in your case the best path moves are 3 steps right, 14-12-10-8.
In either case, whether I am right or wrong in the above expression of doubt, you can still hit an ambiguous path point in the matrix. An orthogonal move represents a gap in either sequence, while a diagonal move corresponds to an alignment (which I think always represents a mismatch if the choice is ambiguous, but prove that). Really you can follow either path. As long as you reach a zero or the corner, you will get two paths with the same cost. A preference of a gap vs. mismatch has been already applied when constructing the matrix, encoded in the balance of gap and mismatch penalties, so the case is truly ambiguous.