Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 509913
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T07:03:36+00:00 2026-05-13T07:03:36+00:00

I need to write an efficient algorithm for looking up words with missing letters

  • 0

I need to write an efficient algorithm for looking up words with missing letters in a dictionary and I want the set of possible words.

For example, if I have th??e, I might get back "these", "those", "theme:, "there", etc.

There will be up to TWO question marks and when two question marks do occur, they will occur in sequence.

I was wondering if anyone can suggest some data structures or algorithm I should use.

A Trie is too space-inefficient and would make it too slow. Any other ideas modifications?

Currently I am using 3 hash tables for when it is an exact match, 1 question mark, and 2 question marks.
Given a dictionary I hash all the possible words. For example, if I have the word WORD. I hash WORD, ?ORD, W?RD, WO?D, WOR?, ??RD, W??D, and WO?? into the dictionary. Then I use a link list to link the collisions together. So say hash(W?RD) = hash(STR?NG) = 17. hashtab(17) will point to WORD and WORD points to STRING because it is a linked list.

The timing on average lookup of one word is about 2e-6s. I am looking to do better, preferably on the order of 1e-9. It took 0.5 seconds for 3m entries insertions and it took 4 seconds for 3m entries lookup.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T07:03:36+00:00Added an answer on May 13, 2026 at 7:03 am

    I believe in this case it is best to just use a flat file where each word stands in one line. With this you can conveniently use the power of a regular expression search, which is highly optimized and will probably beat any data structure you can devise yourself for this problem.

    Solution #1: Using Regex

    This is working Ruby code for this problem:

    def query(str, data)    
      r = Regexp.new("^#{str.gsub("?", ".")}$")
      idx = 0
      begin
        idx = data.index(r, idx)
        if idx
          yield data[idx, str.size]
          idx += str.size + 1
        end
      end while idx
    end
    
    start_time = Time.now
    query("?r?te", File.read("wordlist.txt")) do |w|
      puts w
    end
    puts Time.now - start_time
    

    The file wordlist.txt contains 45425 words (downloadable here). The program’s output for query ?r?te is:

    brute
    crate
    Crete
    grate
    irate
    prate
    write
    wrote
    0.013689
    

    So it takes just 37 milliseconds to both read the whole file and to find all matches in it. And it scales very well for all kinds of query patterns, even where a Trie is very slow:

    query ????????????????e

    counterproductive
    indistinguishable
    microarchitecture
    microprogrammable
    0.018681
    

    query ?h?a?r?c?l?

    theatricals
    0.013608
    

    This looks fast enough for me.

    Solution #2: Regex with Prepared Data

    If you want to go even faster, you can split the wordlist into strings that contain words of equal lengths and just search the correct one based on your query length. Replace the last 5 lines with this code:

    def query_split(str, data)
      query(str, data[str.length]) do |w|
        yield w
      end
    end
    
    # prepare data    
    data = Hash.new("")
    File.read("wordlist.txt").each_line do |w|
      data[w.length-1] += w
    end
    
    # use prepared data for query
    start_time = Time.now
    query_split("?r?te", data) do |w|
      puts w
    end
    puts Time.now - start_time
    

    Building the data structure takes now about 0.4 second, but all queries are about 10 times faster (depending on the number of words with that length):

    • ?r?te 0.001112 sec
    • ?h?a?r?c?l? 0.000852 sec
    • ????????????????e 0.000169 sec

    Solution #3: One Big Hashtable (Updated Requirements)

    Since you have changed your requirements, you can easily expand on your idea to use just one big hashtable that contains all precalculated results. But instead of working around collisions yourself you could rely on the performance of a properly implemented hashtable.

    Here I create one big hashtable, where each possible query maps to a list of its results:

    def create_big_hash(data)
      h = Hash.new do |h,k|
        h[k] = Array.new
      end    
      data.each_line do |l|
        w = l.strip
        # add all words with one ?
        w.length.times do |i|
          q = String.new(w)
          q[i] = "?"
          h[q].push w
        end
        # add all words with two ??
        (w.length-1).times do |i|
          q = String.new(w)      
          q[i, 2] = "??"
          h[q].push w
        end
      end
      h
    end
    
    # prepare data    
    t = Time.new
    h = create_big_hash(File.read("wordlist.txt"))
    puts "#{Time.new - t} sec preparing data\n#{h.size} entries in big hash"
    
    # use prepared data for query
    t = Time.new
    h["?ood"].each do |w|
      puts w
    end
    puts (Time.new - t)
    

    Output is

    4.960255 sec preparing data
    616745 entries in big hash
    food
    good
    hood
    mood
    wood
    2.0e-05
    

    The query performance is O(1), it is just a lookup in the hashtable. The time 2.0e-05 is probably below the timer’s precision. When running it 1000 times, I get an average of 1.958e-6 seconds per query. To get it faster, I would switch to C++ and use the Google Sparse Hash which is extremely memory efficient, and fast.

    Solution #4: Get Really Serious

    All above solutions work and should be good enough for many use cases. If you really want to get serious and have lots of spare time on your hands, read some good papers:

    • Tries for Approximate String Matching – If well implemented, tries can have very compact memory requirements (50% less space than the dictionary itself), and are very fast.
    • Agrep – A Fast Approximate Pattern-Matching Tool – Agrep is based on a new efficient and flexible algorithm for approximate string matching.
    • Google Scholar search for approximate string matching – More than enough to read on this topic.
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to write more efficient code in a C program, and I need
I need to write to a text file using JavaScript. I have a machine
I need to write the output of the code I have to a file
I have been trying to come up with a way to write an efficient
I'm trying to write an efficient recursive query. I've run across CTEs and have
I'm looking for an efficient algorithm that can give me all the edges of
The application need write file's last modification date. void Dater(String DateFile) { File file
I need to write a script in Matlab, which will read some data from
I need to write a C++ code coverage program that takes in another C++
I need to write a query where I need to find the ' character.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.