I am trying to solve this challenge on InterviewStreet: https://www.interviewstreet.com/challenges/dashboard/#problem/4edb8abd7cacd
I already have a working algorithm but I would to improve its performance. Do you have any suggestions how to do so?
# Enter your code here. Read input from STDIN. Print output to STDOUT
N = gets.to_i
words = []
while words.length < N do
words << gets.sub(/\\n$/, '').strip
end
words.each do |word|
count = 0
(word.length).times do |i|
sub = word[i..-1]
j=0
while j < sub.length && sub[j] == word[j] do
count += 1
j+=1
end
end
puts count
end
Thanks,
Greg
Your algorithm is in the worst case quadratic. For most normal words, there is no quadratic behaviour, and it works well enough (due to its simplicity, it runs probably faster than more sophisticated algorithms with better worst-case behaviour).
One algorithm with linear worst-case behaviour is the Z-algorithm. I don’t speak much ruby, so for the time being, the Python version will have to do:
Explanation of the algorithm:
The idea is simple (but, like most good ideas, not easy to have). Let us call a (non-empty) substring that is also a prefix of the string a prefix-substring. To avoid recomputation, the algorithm uses a window of the prefix-substring starting before the currently considered index that extends farthest to the right (initially, the window is empty).
Variables used and invariants of the algorithm:
i, the index under consideration, starts at 1 (for 0-based indexing; the entire string is not considered) and is incremented tolength - 1leftandright, the first and last index of the prefix-substring window; invariants:left < i,left <= right < length(S), eitherleft > 0orright < 1,left > 0, thenS[left .. right]is the maximal common prefix ofSandS[left .. ],1 <= j < iandS[j .. k]is a prefix ofS, thenk <= rightZ, invariant: for1 <= k < i,Z[k]contains the length of the longest common prefix ofS[k .. ]andS.The algorithm:
i = 1,left = right = 0(any values withleft <= right < 1are allowed), and setZ[j] = 0for all indices1 <= j < length(S).i == length(S), stop.i > right, find the lengthlof the longest common prefix ofSandS[i .. ], store it inZ[i]. Ifl > 0we have found a window extending farther right than the previous, then setleft = iandright = i+l-1, otherwise leave them unchanged. Incrementiand go to 2.Here
left < i <= right, so the substringS[i .. right]is known – sinceS[left .. right]is a prefix ofS, it is equal toS[i-left .. right-left].Now consider the longest common prefix of
Swith the substring starting at indexi - left.Its length is
Z[i-left], henceS[k] = S[i-left + k]for0 <= k < Z[i-left]andS[Z[i-left]] ≠ S[i-left+Z[i-left]]. Now, ifZ[i-left] <= right-i, theni + Z[i-left]is inside the known window, thereforeand we see that the length of the longest common prefix of
SandS[i .. ]has lengthZ[i-left].Then set
Z[i] = Z[i-left], incrementi, and go to 2.Otherwise,
S[i .. right]is a prefix ofSand we check how far it extends, starting the comparison of characters at the indicesright+1andright+1 - i. Let the length bel. SetZ[i] = l,left = i,right = i + l - 1, incrementi, and go to 2.Since the window never moves left, and the comparisons always start after the end of the window, each character in the string is compared at most once successfully to an earlier character in the string, and for each starting index, there is at most one unsuccessful comparison, therefore the algorithm is linear.