I have a project where I want to implement voice recognition into a website.
Imagine the user is doing a video phone call, and also has no hand free to interact.
it would be sufficient if it would recognize some keywords only (like “snapshot” or “menu”).
I got it to work in chrome (x-webkit-speech), but it has to be in IE8.
Other conditions:
- If it’s possible the voice recording should be all the time (right when the homepage is opened). So even though the user is talking to another person, it should react if it hears a keyword. I don’t want something like SIRI, where you push a button to start recording
- The phone call is very confidential. the firm I’m doing this for does not want to send the whole conversation to Google, where it can possibly be analysed and the content saved.
I don’t expect anyone to give me a full solution, but since I’m really new to this and in a hurry, I would appreciate if someone could point me in the right direction 🙂
Thank you!
i just stumbled on my own question…
here is my solution:
we recorded the sound with flash.
converted it to .wav.
connect to a socket of a c#-server and send the file as a bytearray.
Problem:
since you are only receiving bytes, the client hast to send the size first and terminate it with some kind of character so you know when it ends.
you can’t just take a random character and terminate the wav file with it. you’d never know when the transmission is done if you don’t get the size fist.
the c#-server ran with .NET 4.? which has voice recognition.
analyse the .wav-file and send the string back to the client.
flash can invoke methods in javascript => problem solved!
of course this is WAY ugly, but our customer was still very happy with it, because it worked and fulfilled all the conditions he asked for.