I’m a Computer Science student and have to complete my final year project. I am looking for title suggestions as this actually seems to be the most complicated aspect of the whole project. Basically, my project is:
- Developing an algorithm that analyses Speech, specifically detecting if someone is saying either “Yes” or “No” and then determine if the person is either male or female.
@mmoment
I’m going to develop an algorithm that determines if a sample (a person’s voice) is either saying “Yes” or “No”. The algorithm will work by splitting the sample into blocks, and, then finding the zero-crossings of each block and then using HMM’s to determine if someone is saying either “Yes” or “No”. This can be done, I believe, by picking out Phones. For example, if the Phone “Y” is picked out, then we can infer that the word is Yes, however, if the Phone isn’t “Y” /or/ “N” then we can infer that the word is No. Does this make sense?
There are some issues relating to this problem, I know and before I submit my final proposal, I will “iron” them out and hopefully come to a final decision.
I hope someone can help me :)!
A friend of mine worked on a similar project and evaluated the use of hidden markov chains/models for speech recognition:
http://en.wikipedia.org/wiki/Hidden_Markov_model
Also, the use of Matlab is highly recommended ( It requires some special toolboxes though ).
@Phorce instead of going for the zero crossings, why not transfer the time continuous signal into another domain ( you would do that anyways, into the time discrete fourier domain ) and check for base frequencies and occurences of significant ripple harmonic frequencies? That should do the trick determining the difference between YES or NO.
This would dismiss any significant relevance of the amplitudes and distinction would be fairly easy.
The real task here is to determine different pronounciations, which would probably just be a stretch of the signal / bandwidth or a shift in the frequency domain.
The shift in the frequency domain will also help to distinguish femal from male speakers.. but you will probably have to use some form of neural network so your system can learn / adapt to different speakers, voices etc ..!