I just came up with an idea that I want to develop into an application to distinguish/auto detect voices from different people.
Sample use case: After training with Obama and Romney’s data, the application would be able to detect whenever either one speak again (not necessary the same content from the training data)
I am wondering if there are any existing research on this. (I don’t know how to search for this. I tried a couple keywords and got no significant results.)
If not, what is a good way to start? How to choose features, data representation, models, etc.
Thanks!
I found Speaker recognition on Wikipedia which in turn linked to An overview of text-independent speaker recognition: From features to supervectors (Kinnunen, Li, 2010).
From the abstract of the paper: