I have a series of puzzles: Strings of morse code with no spaces between the letters or words. My plan is to do a dictionary attack to find the best solution candidates. My weapon is Python.
I have a list of 17000 English words. I also have a much smaller list of words that are pertinent to the puzzle’s theme, and if those words show up they should score higher.
So at the very beginning of my script when I generate the list of words, I use a list of tuples of the form (word, scoremultiplier). Here’s a small subset:
[('zoned', 1.0),
('zonely', 1.0),
('zoner', 1.0),
('zones', 1.0),
('zoning', 1.0),
('zoo', 1.0),
('zoom', 1.0),
('zoomed', 1.0),
('zooming', 1.0),
('zooms', 1.0),
('zoos', 1.0),
('ten', 1.0),
('tens', 1.0),
('gnash', 1.0),
('shag', 1.0),
('75th', 2.0),
('seventy', 2.0),
('fifth', 2.0)]
In the file that I parse all that out of, I want to just stick the high-value words at the end, without manually getting rid of any duplicates in the main part of the file. So I need to write something to get rid of the early tuples whose first value is equal to that of a later tuple.
I can do this with brute force:
for firstkey, (firstword, firstfactor) in enumerate(wordlist):
for laterkey, (laterword, laterfactor) in enumerate(wordlist[firstkey+1:]):
if firstword == laterword:
del wordlist[firstkey]
break
But that part of the script alone takes almost 45 seconds, and my 17000 words isn’t even a full dictionary. (That code is also untested other than the time it takes to finish, so it may not even work.) It also seems very un-Pythony, though I’m just now learning Python (and doing some of my first programming at all) with this very project.
Is there a better way to do this? I can’t use set() because the duplicate words are part of nonequal tuples. Do I need to restructure my data somehow? Or should I just be prepared to wait a full minute every time I run this?
I might be misunderstanding the question, but it looks like you can generate a
dictfrom the list of tuples. Later values will automatically overwrite earlier ones: