Hey. I know this is not a ‘refactor my code’ site but I made this little piece of code which works perfectly fine with moderately sized input but it’s problematic with string of size, say, over 2000.
What it does – it takes a string of numbers as a parameter, and it returns the number of ways it can be interpreted as a string of letters, where each letter in the English alphabet is assigned a numeric value according to its lexical position: A -> 1, B-> 2, Z-> 26 etc.
Since some letters are represented as two numbers the suffix tree is not unique so there can be multiple interpretations. For example ‘111’ could be both ‘AAA’, ‘KA’ and ‘AK’.
This is my code. It’s fairly readable and straightforward but it’s problematic because:
- It has to copy part of the string every time to call it as argument to the recursive part.
- It has to store huge strings in the cache so it’s very expensive, memory-wise.
- … it’s recursive.
Help much appreciated 🙂
cache = dict()
def alpha_code(numbers):
"""
Returns the number of ways a string of numbers
can be interpreted as an alphabetic sequence.
"""
global cache
if numbers in cache: return cache[numbers]
## check the basic cases
if numbers.startswith('0'): return 0
if len(numbers) <= 1: return 1
## dynamic programming part
## obviously we can treat the first (non-zero)
## digit as a single letter and continue -
## '342...' -> C + '42...'
total = alpha_code(numbers[1:])
## the first two digits make for a legal letter
## iff this condition holds
## '2511...' -> Y + '11...'
## '3711...' -> illegal
if numbers[:2] <= '26':
total += alpha_code(numbers[2:])
cache[numbers] = total
return total
Try using a dynamic programming approach instead:
Fill in the rest of the array from left to right, via the following rule (pseudocode):
P[x] =
(if current character is ‘0’ then 0, else P[x-1])
+
(if previous character + current character can be interpreted as a letter
then P[x-2] else 0)
(Note that if P[x] is ever 0 you should return zero, since that means there were two 0’s in a row which your rules don’t seem to allow.)
The first portion of the sum is to deal with the case where the current character is interpreted as a letter; the second part of the sum is to deal with the case where the 2 most recent characters are interpreted as a letter.
Essentially, P[x] will be equal to the number of ways that the entirety of the string from the start up to position x can be interpreted as letters. Since you can determine this from looking at previous results, you only need to loop through the contents of the string once – an O(N) time instead of a O(2N) which is a huge improvement. Your final result is simply P[len(input)-1] since “everything from the start up to the end” is the same as just “the entire string”.
Example run for your very basic input case of ‘111’:
Since P[2] is our last result, and it’s 3, our answer is 3.
If the string were ‘1111’ instead, we’d continue another step:
The answer is indeed 5 – valid interpretations being AAAA, KK, AKA, AAK, KAA. Notice how those 5 potential answers are built up from the potential interpretations of ’11’ and ‘111’:
’11’: AA or K
‘111’: AAA or KA or AK
‘111’+A: AAA+A or KA+A or AK+A
’11’+K: AA+K or K+K