I am trying to write a hangman algorithm. My idea for it goes like this:
- Pre-process a dictionary that contains the relative letter frequencies of words depending on their length. Step complete.
Example:
#Each key corresponds to length of the word.
frequencyDict = {2: ['a', 'o', 'e', 'i', 'm', 'h', 'n', 'u', 's', 't', 'y', 'b', 'd', 'l', 'p', 'x', 'f', 'r', 'w', 'g', 'k', 'j'],
3: ['a', 'e', 'o', 'i', 't', 's', 'u', 'p', 'r', 'n', 'd', 'b', 'm', 'g', 'y', 'l', 'h', 'w', 'f', 'c', 'k', 'x', 'v', 'j', 'z', 'q'],
4: ['e', 'a', 's', 'o', 'i', 'l', 'r', 't', 'n', 'u', 'd', 'p', 'm', 'h', 'b', 'c', 'g', 'k', 'y', 'f', 'w', 'v', 'j', 'z', 'x', 'q'],
5: ['s', 'e', 'a', 'o', 'r', 'i', 'l', 't', 'n', 'd', 'u', 'c', 'p', 'y', 'm', 'h', 'g', 'b', 'k', 'f', 'w', 'v', 'z', 'x', 'j', 'q'],
6: ['e', 's', 'a', 'r', 'i', 'o', 'l', 'n', 't', 'd', 'u', 'c', 'p', 'm', 'g', 'h', 'b', 'y', 'f', 'k', 'w', 'v', 'z', 'x', 'j', 'q'],
7: ['e', 's', 'a', 'i', 'r', 'n', 'o', 't', 'l', 'd', 'u', 'c', 'g', 'p', 'm', 'h', 'b', 'y', 'f', 'k', 'w', 'v', 'z', 'x', 'j', 'q'],
8: ['e', 's', 'i', 'a', 'r', 'n', 'o', 't', 'l', 'd', 'c', 'u', 'g', 'p', 'm', 'h', 'b', 'y', 'f', 'k', 'w', 'v', 'z', 'x', 'q', 'j']}
I also have a generator of words in a dictionary:
dictionary = word_reader('C:\\Python27\\dictionary.txt', len(letters))
Which is based on this function
#Strips dictionary of words that are too big or too small from the list
def word_reader(filename, L):
L2 = L+2
return (word.strip() for word in open(filename) \
if len(word) < L2 and len(word) > 2)
- This particular game will give you the last vowel for free. If the word was earthen, for example,
the user would be given the following board: e—-e- to guess. So, I want to find a way to create a new generator or list
with all the words stripped out of it that do not conform to the e—-e- template.
p = re.compile('^e\D\D\D\De\D$', re.IGNORECASE) will do it, but it might find words
that contain ‘e’s in other places besides the first letter and second to last letter.
So my first question is:
- How do I ensure that an ‘e’ is
located ONLY in the first and the
second-to-last position - How do I do create an intelligent function that will have a new regex as the puzzle updates and the computer keeps making its guesses?
For example, if the word is monkey, the computer would just be given —-e-
The first step would be for it to strip from its dictionary all words that are not 6 letters, and all words that do not conform perfectly to the ‘—-e-‘ template and put that in a newList. How do
I go about doing this?
It then computes a NEW frequencyDict based on the relative frequency of words that are in its
newList.
My current method of doing this looks like this:
cnt = Counter()
for words in dictionary:
for letters in words:
cnt[letters]+=1
Is this the most efficient way?
It would then use the newfrequencyDict to guess the most common letter, assuming it has
not already been guessed. It continues to do this until (hopefully) the word is guessed.
Is this an efficient algorithm? Are there better implementations?
There’s nothing particularly magical about regexes, and matching them against your whole dictionary is still going to take O(n) time. I’d recommend writing your own function that determines if a word is a match for a template, and running your dictionary-so-far through that.
Here’s an example function:
As far as determining the next character to guess, you probably don’t want to select the most frequent character. Instead, you want to select the character that comes closest to being in 50% of words, meaning you eliminate the most possibilities either way. Even that isn’t optimal – it could be that certain characters are more likely to occur twice in the word, and therefore eliminate a larger proportion of candidates – but it’s closer.