I’m working on implementing speech recognition in my call center. I am using Miscrosoft

Question

0

Asked: June 8, 20262026-06-08T02:26:16+00:00 2026-06-08T02:26:16+00:00

I’m working on implementing speech recognition in my call center. I am using Miscrosoft

0

I’m working on implementing speech recognition in my call center. I am using Miscrosoft Speech Platform, and I want to be able to replace my DTMF recognition with speech recognition (for example, ‘Say the department you are trying to reach” instead of “press one for sales”).

I have the SpeechRecognitionEngine working perfectly to my specifications, with one exception. While recognizing spontaneous speech I must account for disfluencies (‘uh’, ‘um’, ‘er’, ‘you know’, ‘like’). My question is, are there any methods within the .NET framework that allow the recognition engine to bypass these utterances and continue searching for actual speech?

If there aren’t any pre-supplied methods, how would you go about bypassing these disfluencies? I suspect the answer may lie in how I construct my grammar, but any insight would be greatly appreciated.

Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T02:26:18+00:00

The way to handle this is in your grammars. You need to add these “disfluencies” to the rules in your grammars. That is where the tuning come in for speech recognition. You need to look at all of the unrecognized phrases in your application and listen to the audio recording to figure out what users are saying that is “out of grammar” and then add them. For example, if you ask the user, “What would you like to eat, a pizza or a hamburger?” If your grammar is only setup to handle “pizza” or “hamburger” and the user responds “um pizza” then it will fail as out of grammar. You need to add “um” to the rules in such a way that it is optional. If you are using XML grammars it may look something like this:

 <rule id="whatToEat">
   <ruleref uri="influencies" repeat="0-1" />
   <one-of>
     <item>pizza</item>
     <item>hamburger</item>
   </one-of>
 </rule>
 <rule id="influencies">
   <one-of>
     <item>uh</item>
     <item>um</item>
   </one-of>
 </rule>

If you do not want to include the “influencies” in the return values you would use tags to return the semantic interpretation. How you include this semantic interpretation can vary from platform to platform, but here is one example:

 <rule id="whatToEat">
   <ruleref uri="influencies" repeat="0-1" />
   <one-of>
     <item>pizza<tag>out.mySlot="pizza"</tag></item>
     <item>hamburger<tag>out.mySlot="hamburger"</tag></item>
   </one-of>
 </rule>
 <rule id="influencies">
   <one-of>
     <item>uh</item>
     <item>um</item>
   </one-of>
 </rule>

Microsoft has a discussion on semantic interpretation here.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m working on implementing speech recognition in my call center. I am using Miscrosoft

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply