I am stuck trying to figure out how to do string hashing with linear probing.
Basically, the idea is to hash every string from a dictionary (90000 words), and retrieve anagrams of selected words.
Here’s what I did:
-
created a hash table 2*90000 in size
-
using a simple hash function, I hash each word from the dictionary, get a value
-
check if that hash table index is empty, if it is, assign the value, if not, generate a new hash value.
-
after every word is in the hash table, and I perform a search
-
the search word will receive a hash value after the hash function, and it will be checked whether that value exists in the hash table or not.
-
if it exists, it will compare the string using permutations. if the match is true, it will output it. if not, it will keep looking using a new hash value.
problem is, the whole process is extremely slow… it indexes fine, but searching takes REALLY long time.
I am out of ideas on how to make this faster..
Thank you for your time reading this.
Put all the letters in alphabetical order first, then hash the result with any hashing algorithm you please (crc32, md5sum, sha1, count the vowels, anything… though counting the vowels will lead to a less-efficient solution), and store the word as a leaf node to that hash entry (in a linked list, obviously) — do a mod(x) on the hash result to limit the buckets to 2^x.
Then, when you go to find an anagram, do the exact same “insert” procedure on your test word: alphabetize the letters, then run it through your same hash function. Then for each leaf node, compare the alphabetized letter list with the saved word’s alphabetized list. Each match is an anagram.
(I normally don’t like to give homework help, but this one was too tempting. Now I kind of want to go write a fun little program to find all the anagrams in a given dictionary.)