I have 10 arbitrary letters and need to check the max length match from words file
-
I started to learn RE just some time ago, and can’t seem to find suitable pattern
- first idea that came was using set: [10 chars] but it also repeats included chars and I don’t know how to avoid that
-
I stared to learn Python recently but before RE and maybe RE is not needed and this can be solved without it
- using “for this in that:” iterator seems inappropriate, but maybe itertools can do it easily (with which I’m not familiar)
I guess solution is known even to novice programmers/scripters, but not to me
Thanks
I’m guessing this is something like finding possible words given a set of Scrabble tiles, so that a character can be repeated only as many times as it is repeated in the original list.
The trick is to efficiently test each character of each word in your word file against a set containing your source letters. For each character, if found in the test set, remove it from the test set and proceed; otherwise, the word is not a match, and go on to the next word.
Python has a nice function
allfor testing a set of conditions based on elements in a sequence.allhas the added feature that it will “short-circuit”, that is, as soon as one item fails the condition, then no more tests are done. So if your first letter of your candidate word is ‘z’, and there is no ‘z’ in your source letters, then there is no point in testing any more letters in the candidate word.My first shot at writing this was simply:
Unfortunately, the bug here is that if the source letters contained a single ‘m’, a word with several ‘m’s would erroneously match, since each ‘m’ would separately match the given ‘m’ in the source testset. So I needed to remove each letter as it was matched.
I took advantage of the fact that
set.remove(item)returns None, which Python treats as a BooleanFalse, and expanded my generator expression used in callingall. For each c in word, if it is found in testset, I want to additionally remove it from testset, something like (pseudo-code, not valid Python):Since set.remove returns a None, I can replace the quoted bit above with “not testset.remove(c)”, and now I have a valid Python expression:
Now we just need to wrap that in a loop that checks each word in the list (be sure to build a fresh testset before checking each word, since our
alltest has now become a destructive test):The final step is to sort the matches by descending length. We can pass a key function to sort. The builtin
lenwould be good, but that would sort by ascending length. To change it to a descending sort, we use a lambda to give us notlen, but-1 * len:Now you can just print out the longest word, at matches[0], or iterate over all matches and print them out.
(I was surprised that this brute force approach runs so well. I used the 2of12inf.txt word list, containing over 80,000 words, and for a list of 10 characters, I get back the list of matches in about 0.8 seconds on my little 1.99GHz laptop.)