Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6705481
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T07:24:32+00:00 2026-05-26T07:24:32+00:00

I’ve been tasked with creating a simple spell checker for an assignment but have

  • 0

I’ve been tasked with creating a simple spell checker for an assignment but have given next to no guidance so was wondering if anyone could help me out. I’m not after someone to do the assignment for me, but any direction or help with the algorithm would be awesome! If what I’m asking is not within the guildlines of the site then I’m sorry and I’ll look elsewhere. 🙂

The project loads correctly spelled lower case words and then needs to make spelling suggestions based on two criteria:

  • One letter difference (either added or subtracted to get the word the same as a word in the dictionary). For example ‘stack’ would be a suggestion for ‘staick’ and ‘cool’ would be a suggestion for ‘coo’.

  • One letter substitution. So for example, ‘bad’ would be a suggestion for ‘bod’.

So, just to make sure I’ve explained properly.. I might load in the words [hello, goodbye, fantastic, good, god] and then the suggestions for the (incorrectly spelled) word ‘godd’ would be [good, god].

Speed is my main consideration here so while I think I know a way to get this work, I’m really not too sure about how efficient it’ll be. The way I’m thinking of doing it is to create a map<string, vector<string>> and then for each correctly spelled word that’s loaded in, add the correctly spelled work in as a key in the map and the populate the vector to be all the possible ‘wrong’ permutations of that word.

Then, when I want to look up a word, I’ll look through every vector in the map to see if that word is a permutation of one of the correctly spelled word. If it is, I’ll add the key as a spelling suggestion.

This seems like it would take up HEAPS of memory though, cause there would surely be thousands of permutations for each word? It also seems like it’d be very very slow if my initial dictionary of correctly spelled words was large?

I was thinking that maybe I could cut down time a bit by only looking in the keys that are similar to the word I’m looking at. But then again, if they’re similar in some way then it probably means that the key will be a suggestion meaning I don’t need all those permutations!

So yeah, I’m a bit stumped about which direction I should look in. I’d really appreciate any help as I really am not sure how to estimate the speed of the different ways of doing things (we haven’t been taught this at all in class).

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T07:24:32+00:00Added an answer on May 26, 2026 at 7:24 am

    The simpler way to solve the problem is indeed a precomputed map [bad word] -> [suggestions].

    The problem is that while the removal of a letter creates few “bad words”, for the addition or substitution you have many candidates.

    So I would suggest another solution 😉

    Note: the edit distance you are describing is called the Levenshtein Distance

    The solution is described in incremental step, normally the search speed should continuously improve at each idea and I have tried to organize them with the simpler ideas (in term of implementation) first. Feel free to stop whenever you’re comfortable with the results.


    0. Preliminary

    • Implement the Levenshtein Distance algorithm
    • Store the dictionnary in a sorted sequence (std::set for example, though a sorted std::deque or std::vector would be better performance wise)

    Keys Points:

    • The Levenshtein Distance compututation uses an array, at each step the next row is computed solely with the previous row
    • The minimum distance in a row is always superior (or equal) to the minimum in the previous row

    The latter property allow a short-circuit implementation: if you want to limit yourself to 2 errors (treshold), then whenever the minimum of the current row is superior to 2, you can abandon the computation. A simple strategy is to return the treshold + 1 as the distance.


    1. First Tentative

    Let’s begin simple.

    We’ll implement a linear scan: for each word we compute the distance (short-circuited) and we list those words which achieved the smaller distance so far.

    It works very well on smallish dictionaries.


    2. Improving the data structure

    The levenshtein distance is at least equal to the difference of length.

    By using as a key the couple (length, word) instead of just word, you can restrict your search to the range of length [length - edit, length + edit] and greatly reduce the search space.


    3. Prefixes and pruning

    To improve on this, we can remark than when we build the distance matrix, row by row, one world is entirely scanned (the word we look for) but the other (the referent) is not: we only use one letter for each row.

    This very important property means that for two referents that share the same initial sequence (prefix), then the first rows of the matrix will be identical.

    Remember that I ask you to store the dictionnary sorted ? It means that words that share the same prefix are adjacent.

    Suppose that you are checking your word against cartoon and at car you realize it does not work (the distance is already too long), then any word beginning by car won’t work either, you can skip words as long as they begin by car.

    The skip itself can be done either linearly or with a search (find the first word that has a higher prefix than car):

    • linear works best if the prefix is long (few words to skip)
    • binary search works best for short prefix (many words to skip)

    How long is “long” depends on your dictionary and you’ll have to measure. I would go with the binary search to begin with.

    Note: the length partitioning works against the prefix partitioning, but it prunes much more of the search space


    4. Prefixes and re-use

    Now, we’ll also try to re-use the computation as much as possible (and not just the “it does not work” result)

    Suppose that you have two words:

    • cartoon
    • carwash

    You first compute the matrix, row by row, for cartoon. Then when reading carwash you need to determine the length of the common prefix (here car) and you can keep the first 4 rows of the matrix (corresponding to void, c, a, r).

    Therefore, when begin to computing carwash, you in fact begin iterating at w.

    To do this, simply use an array allocated straight at the beginning of your search, and make it large enough to accommodate the larger reference (you should know what is the largest length in your dictionary).


    5. Using a “better” data structure

    To have an easier time working with prefixes, you could use a Trie or a Patricia Tree to store the dictionary. However it’s not a STL data structure and you would need to augment it to store in each subtree the range of words length that are stored so you’ll have to make your own implementation. It’s not as easy as it seems because there are memory explosion issues which can kill locality.

    This is a last resort option. It’s costly to implement.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a jquery bug and I've been looking for hours now, I can't
I have just tried to save a simple *.rtf file with some websites and
Seemingly simple, but I cannot find anything relevant on the web. What is the
I have a French site that I want to parse, but am running into
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I want to count how many characters a certain string has in PHP, but
this is what i have right now Drawing an RSS feed into the php,
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this
I have this code: - (void)parser:(NSXMLParser *)parser foundCDATA:(NSData *)CDATABlock { NSString *someString = [[NSString

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.