Say you want to take CMU’s phonetic data set input that looks like this:
ABERRATION AE2 B ER0 EY1 SH AH0 N ABERRATIONAL AE2 B ER0 EY1 SH AH0 N AH0 L ABERRATIONS AE2 B ER0 EY1 SH AH0 N Z ABERT AE1 B ER0 T ABET AH0 B EH1 T ABETTED AH0 B EH1 T IH0 D ABETTING AH0 B EH1 T IH0 NG ABEX EY1 B EH0 K S ABEYANCE AH0 B EY1 AH0 N S
(The word is to the left, to the right are a series of phonemes, key here)
And you want to use it as training data for a machine learning system that would take new words and guess how they would be pronounced in English.
It’s not so obvious to me at least because there isn’t a fixed token size of letters which could possible map to a phoneme. I have a feeling that something to do with a markov chain might be the right way to go.
How would you do this?
The problem is called Grapheme-to-phoneme conversion, a subproblem of Natural Language Processing. Google brings up a few papers.