It’s easy to write a grammar file for speech recognition from only 50 words because you can just do it manually. What is the easiest, most efficient way to do it if you have 10,000 or 100,000 words?
Example:
Say we have ‘RC cola’ and ‘Pepsi cola’. We would have grammar file consisting of 2 rules:
DRINK: (COLANAME ?[coke cola soda])
COLANAME: [rc pepsi]
It will recognizes ‘RC’,’RC Coke’,’RC Cola’,’RC Soda’, ‘Pepsi’, ‘Pepsi Coke’, ‘Pepsi Cola’ and ‘Pepsi Soda’.
Edit: I’m talking about grammar for speech recognition. Speech recognition systems need an accompanying grammar file so they know what to recognize (gsl, grxml). And I was actually also thinking about not just any words but something like names where you can’t classify into categories.
I don’t have an answer that will solve my problems but Yuval’s answer clearly suggests that this is a subject that’s still under development and it is not a mature enough subject. I understand that there’s probably no easy grammar fix that exists right now (at least outside the research labs). The only solution to doing a good grammar right now is probably constant learning of user inputs and agile refactoring of the grammar files.