I’m using NHunspell to check a string for spelling errors like so:
var words = content.Split(' ');
string[] incorrect;
using (var spellChecker = new Hunspell(affixFile, dictionaryFile))
{
incorrect = words.Where(x => !spellChecker.Spell(x))
.ToArray();
}
This generally works, but it has some problems. For example, if I’m checking the sentence “This is a (very good) example”, it will report “(very” and “good)” as being misspelled. Or if the string contains a time such as “8:30”, it will report that as a misspelled word. It also has problems with commas, etc.
Microsoft Word is smart enough to recognize a time, fraction, or comma-delimited list of words. It knows when not to use an English dictionary, and it knows when to ignore symbols. How can I get a similar, more intelligent spell check in my software? Are there any libraries that provide a little more intelligence?
EDIT:
I don’t want to force users to have Microsoft Word installed on their machine, so using COM interop is not an option.
If your spell checker is really that stupid, you should pre-tokenize its input to get the words out and feed those one at a time (or as a string joined with spaces). I’m not familiar with C#/.NET, but in Python, you’d use a simple RE like
\w+for that:and I bet .NET has something very similar. In fact, according to the .NET docs,
\wis supported, so you just have to find out howre.findallis called there.