Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7764377
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T14:53:44+00:00 2026-06-01T14:53:44+00:00

I will have to perform a spelling check-like operation in Python as follows: I

  • 0

I will have to perform a spelling check-like operation in Python as follows:

I have a huge list of words (let’s call it the lexicon). I am now given some text (let’s call it the sample). I have to search for each sample word in the lexicon. If I cannot find it, that sample word is an error.

In short – a brute-force spelling checker. However, searching through the lexicon linearly for each sample word is bound to be slow. What’s a better method to do this?

The complicating factor is that neither the sample nor the lexicon is in English. It is in a language which instead of 26 characters, can have over 300 – stored in Unicode.

A suggestion of any algorithm / data structure / parallelization method will be helpful. Algorithms which have high speed at the cost of less than 100% accuracy would be perfect, since I don’t need 100% accuracy. I know about Norvig’s algorithm for this, but it seems English-specific.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T14:53:45+00:00Added an answer on June 1, 2026 at 2:53 pm

    You can use a set of Unicode strings:

    s = set(u"rabbit", u"lamb", u"calf")
    

    and use the in operator to check whether a word occurs:

    >>> u"rabbit" in s
    True
    >>> u"wolf" in s
    False
    

    This look-up is essentially O(1), so the size of the dictionary does not matter.

    Edit: Here’s the complete code for a (case-sensitive) spell checker (2.6 or above):

    from io import open
    import re
    with open("dictionary", encoding="utf-8") as f:
        words = set(line.strip() for line in f)
    with open("document", encoding="utf-8") as f:
        for w in re.findall(r"\w+", f.read()):
            if w not in words:
                print "Misspelled:", w.encode("utf-8")
    

    (The print assumes your terminal uses UTF-8.)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a python file that will not correctly perform an import. If I
I have to do a background process which will perform some operations.. while this
I have a task I need to perform, do_stuff(opts) , that will take ~1s
I need a CORE that will perform AES-128 Encryption/Decryption. I have searched online but
i have a countries list. Each user can check multiple countries. Once saved, this
I have the problem that I will have multiple servers that perform jobs. They
I'm working on a tool that will perform some simple transformations on programs (like
I will have multiple tables used by different projects on the same mySql server.
I will have a screen in which there will be 11 images one below
I will have a series of threaded conversations, and want to be able to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.