I’ve been having a play with Ruby recently and I’ve just completed the Anagrams Code Kata from http://codekata.pragprog.com.
The solution was test driven and utilises the unique prime factorisation theorem, however it seems to run incredibly slow. Just on the 45k file it’s been running for about 10 minutes so far. Can anyone give me any pointers on improving the performance of my code?
class AnagramFinder
def initialize
@words = self.LoadWordsFromFile("dict45k.txt")
end
def OutputAnagrams
hash = self.CalculatePrimeValueHash
@words.each_index{|i|
word = @words[i]
wordvalue = hash[i]
matches = hash.select{|key,value| value == wordvalue}
if(matches.length > 1)
puts("--------------")
matches.each{|key,value|
puts(@words[key])
}
end
}
end
def CalculatePrimeValueHash
hash = Hash.new
@words.each_index{|i|
word = @words[i]
value = self.CalculatePrimeWordValue(word)
hash[i] = value
}
hash
end
def CalculatePrimeWordValue(word)
total = 1
hash = self.GetPrimeAlphabetHash
word.downcase.each_char {|c|
value = hash[c]
total = total * value
}
total
end
def LoadWordsFromFile(filename)
contentsArray = []
f = File.open(filename)
f.each_line {|line|
line = line.gsub(/[^a-z]/i, '')
contentsArray.push line
}
contentsArray
end
def GetPrimeAlphabetHash
hash = { "a" => 2, "b" => 3, "c" => 5, "d" => 7, "e" => 11, "f" => 13, "g" =>17, "h" =>19, "i" => 23, "j" => 29, "k" => 31, "l" => 37, "m" => 41, "n" =>43, "o" =>47, "p" => 53, "q" =>59, "r" => 61, "s" => 67, "t" => 71, "u" => 73, "v" => 79, "w" => 83, "x" => 89, "y" => 97, "z" => 101 }
end
end
Frederick Cheung has a few good points, but I thought I might provide you with a few descriptive examples.
I think your main problem is that you create your index in a way that forces you to do linear searches in it.
Your word list (
@words) seems to look something like this:That is, it is just an array of words.
Then you create your hash index with
CalculatePrimeValueHash, with hash keys being equal to the word’s index in@words.I would consider this a good start, but the thing is if you keep it like this, you will have to iterate through the hash to find what hash keys (i.e. indexes in
@words) that belong together, and then iterate through those to join them. That is, the basic problem here is that you do things too granularly.If you instead were to build this hash with the prime values as hash keys, and have them point to an array of the words with that key, you would get a hash index like this instead:
With this kind of structure, the only thing you have to do to write your output, is to just iterate over the hash values and print them, since they are already grouped.
Another thing with your code, is that it seems to generate a whole bunch of throwaway objects , which will make sure to keep your garbarge collector busy, and that is generally quite a big choke point in ruby.
It might also be a good thing to go find either a benchmark tool and/or a profiler to analyze your code and see where it could be approved upon.