I have some Morse code that has lost the spaces in between the letters, my challenge is to find out what the message says. So far I have been kinda lost because of the sheer amount of combinations there might be.
Here is all the info on the messages I have.
- The output will be English
- There will always be a translation that make sense
- Here is and example message
-..-...-...-...-..-.-.-.-.-..-.-.-.-.-.-.-.-.-.-..-...-. - The messages should be no longer then 70 characters
- The morse code was taken from a longer stream so it is possible that the first or last groups may be cut off and hence have no valid translation
Does anyone have a clever solution?
This is not an easy problem, because as ruakh suggested there are many viable sentences to a given message. For example ‘JACK AND JILL WENT UP THE HILL’ has the same encoding as ‘JACK AND JILL WALK CHISELED’. Since these are both grammatical sentences and the words in each are common, it’s not obvious how to pick one or the other (or any other of the 40141055989476564163599 different sequences of English words that have the same encoding as this message) without delving into natural language processing.
Anyway, here’s a dynamic programming solution to the problem of finding the shortest sentence (with the fewest characters if there’s a tie). It can also count the total number of sentences that have the same encoding as the given message. It needs a dictionary of English words in a file.
The next enhancements should be a better measure of how likely a sentence is: perhaps word frequencies, false-positive rates in morse (eg, “I” is a common word, but it appears often as part of other sequences of morse code sequences). The tricky part will be formulating a good score function that can be expressed in a way that it can be computed using dynamic programming.