Having recently begun working on a project which might need (good) scaling possibilities, I’ve come up with the following question:
Not taking into account the levensthein algorithm (I’m working with/on different variations), I iterate through each dictionary word and calculate the levensthein distance between the dictionary word and each of the words in my input string. Something along the lines of:
<?php
$input_words = array("this", "is", "a", "test");
foreach ($dictionary_words as $dictionary_word) {
foreach ($input_words as $input_word) {
$ld = levenshtein($input_word, $accepted_word);
if ($ld < $distances[$input_word] || $distances[$word] == NULL) {
$distances[$input_word] = $ld;
if ($ld == 0)
continue;
}
}
}
?>
My question is on best practise: Execution time is ~1-2 seconds.
I’m thinking of running a “dictionary server” which, upon startup, loads the dictionary words into memory and then iterates as part of the spell check (as described above) when a request is recieved. Will this decrease exec time or is the slow part the iteration (for loops)? If so, is there anything I can do to optimize properly?
Google’s “Did you mean: ?” doesn’t take several seconds to check the same input string 😉
Thanks in advance, and happy New Year.
Read Norvig’s How to Write a Spelling Corrector. Although the article uses Python, others have implemented it in PHP here and here.