So I am try to write a script that will turn a to an (when necessary). And it is harder than I thought.
var txt = "This is a apple.";
var pos = txt.search(/a\s[aeiou]/i);
txt = pos != -1 ?
txt.substring(0,pos+1) + "n" + txt.substring(pos+1,txt.length) :
txt;
//"This is an apple."
It is working, but when I try "There are 60 minutes in a hour.", it didn’t not change it into an because of the regex. So I changed it:
var pos = txt.search(/a\s([aeiou]|hour)/i);
Now it is working (at least for “hour”). But now if I put "There are people in a university.", it will change it into an university, which is not correct.
So, is there a regular expression that can cover the rules of using a and an in the English language? Thanks!
There was a very good thread about this on StackOverflow a while ago: How can I correctly prefix a word with "a" and "an"?
Basically the consensus was that the best way involves a large dataset from which to learn, and the second-best way involves a pronunciation dictionary such as the CMU dict designed for speech synthesis.
To give an example from the CMU dict: