I’m trying to write a spellchecker module. It loads a text, creates a dictionary

Question

0

Asked: June 1, 20262026-06-01T12:09:16+00:00 2026-06-01T12:09:16+00:00

I’m trying to write a spellchecker module. It loads a text, creates a dictionary

0

I’m trying to write a spellchecker module.

It loads a text, creates a dictionary from 16 mb file and then checks if encountered word is similar to the word in dictionary (similar = varies up to two chars) if so then it changes it to the form from dictionary.

Right now I’m using a Levenshtein Distance algorithm and processing of a 50 words set takes 3 min…

I’m pretty sure that there must be a faster solution.
Profiler told me that my app spends more than 80% of it’s time in Levenshtein Distance function.

Are there any better solutions/algorithms?

Here is the implemented of version of the algorithm I use:

def levenshteinDistance(s1, s2):
    l_s1 = len(s1)
    l_s2 = len(s2)
    d = [[a for a in genM(x, l_s2 + 1)] for x in xrange(l_s1 + 1)]
    for i in xrange(1, l_s1 + 1):
        for j in xrange(1, l_s2 + 1):
            d[i][j] = min(d[i - 1][j] + 1, d[i][j - 1] + 1, d[i - 1][j - 1] + decide_of_equality(s1[i - 1],s2[j - 1]))
    return d[l_s1][l_s2]

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T12:09:17+00:00

I have used Norvig’s spell corrector, mentioned in the comments and it is awesome.

However coming to your problem, you have written a dynamic programming edit distance Algorithm. Your algorithm qualifies to be a data parallel algorithm. On a shared memory i.e on a single machine if you have multi cores you can exploit them. Do you know something called map-reduce? Please dont think distributed and all right now, just consider one single quad core machine and a shared memory. As a step 1 you can partition your dictionary and allocate a portion to each thread which will run edit distance on a portion of dictionary (similar to a map step). Later all your threads will return you all the words at an edit distance of 2 (similar to reduce step). This way your program will benefit from multi core architecture.

Another thing I could think of is inside your python code write the cpu intensive edit distance algorithm in C i.e by writing a python extension.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to write a spellchecker module. It loads a text, creates a dictionary

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply