I wrote a little program that tries to find a connection between two equal length English words. Word A will transform into Word B by changing one letter at a time, each newly created word has to be an English word.
For example:
Word A = BANG
Word B = DUST
Result:
BANG -> BUNG ->BUNT -> DUNT -> DUST
My process:
-
Load an English wordlist(consist of 109582 words) into a
Map<Integer, List<String>> _wordMap = new HashMap();, key will be the word length. -
User put in 2 words.
-
createGraph creates a graph.
-
calculate the shortest path between those 2 nodes
-
prints out the result.
Everything works perfectly fine, but I am not satisfied with the time it took in step 3.
See:
Completely loaded 109582 words!
CreateMap took: 30 milsecs
CreateGraph took: 17417 milsecs
(HOISE : HORSE)
(HOISE : POISE)
(POISE : PRISE)
(ARISE : PRISE)
(ANISE : ARISE)
(ANILE : ANISE)
(ANILE : ANKLE)
The wholething took: 17866 milsecs
I am not satisfied with the time it takes create the graph in step 3, here’s my code for it(I am using JgraphT for the graph):
private List<String> _wordList = new ArrayList(); // list of all 109582 English words
private Map<Integer, List<String>> _wordMap = new HashMap(); // Map grouping all the words by their length()
private UndirectedGraph<String, DefaultEdge> _wordGraph =
new SimpleGraph<String, DefaultEdge>(DefaultEdge.class); // Graph used to calculate the shortest path from one node to the other.
private void createGraph(int wordLength){
long before = System.currentTimeMillis();
List<String> words = _wordMap.get(wordLength);
for(String word:words){
_wordGraph.addVertex(word); // adds a node
for(String wordToTest : _wordList){
if (isSimilar(word, wordToTest)) {
_wordGraph.addVertex(wordToTest); // adds another node
_wordGraph.addEdge(word, wordToTest); // connecting 2 nodes if they are one letter off from eachother
}
}
}
System.out.println("CreateGraph took: " + (System.currentTimeMillis() - before)+ " milsecs");
}
private boolean isSimilar(String wordA, String wordB) {
if(wordA.length() != wordB.length()){
return false;
}
int matchingLetters = 0;
if (wordA.equalsIgnoreCase(wordB)) {
return false;
}
for (int i = 0; i < wordA.length(); i++) {
if (wordA.charAt(i) == wordB.charAt(i)) {
matchingLetters++;
}
}
if (matchingLetters == wordA.length() - 1) {
return true;
}
return false;
}
My question:
How can I improve my algorithm inorder to speed up the process?
For any redditors that are reading this, yes I created this after seeing the thread from /r/askreddit yesterday.
Here’s a starting thought:
Create a
Map<String, List<String>>(or aMultimap<String, String>if you’ve using Guava), and for each word, “blank out” one letter at a time, and add the original word to the list for that blanked out word. So you’d end up with:At that point, given a word, you can very easily find all the words it’s similar to – just go through the same process again, but instead of adding to the map, just fetch all the values for each “blanked out” version.