I was reading this guide on speech recognition, and it mentioned that I need three items for speech recognition: Acoustic model, Language Model, Phonetic Dictionary.
I wanted to start playing with this python demo, which uses Gstreamer to capture from the mic and resample to 8kHz, 16-bit PCM audio.
I see that I can specify the language model and phonetic dictionary, and I use the one [provided by cmu]:
http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20HUB4%20Language%20Model/
But I am confused where I should specify the acoustic model? Does gstreamer have its own acoustic model I’m implicitly using? I was hoping to use the acoustic model provided here for slightly better results:
http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20HUB4%20Acoustic%20Model/
(Sorry about the hyperlinks. I can’t post more than 2 links with rep less than 10)
You can specify the model with the hmm property of the gstreamer element. Just like it’s covered in tutorial
You can use
Yes, by default it uses US English model hub4wsj_sc_8k from the distribution