Is there an easy-to-use python module that’d do english or finnish text validation?
It’d be ok if I could just check the words exist in user-defined dictionary and possibly checking that the grammar is somewhat okay.
I am planning to implement a fancy validation for a directory contents I did while ago back. This involves some simple stuff like checking that the config scripts won’t crash and does it all well. It’s all quite easy otherwise.
For the validator I should just be able to input whole files or strings of unicode text.
I’m not sure what you’re trying to do, but if you’re looking for something that can say ‘this is valid English’ or ‘this is valid Finnish’, then you’re looking at a class of problems that is quite likely unsolvable.
If not, then use a dictionary and/or letter frequencies and Bayesian analysis to determine whether or not given text is English-like or Finnish-like. If you’re trying to auto-detect a language, this is likely the best route, although you’ll run into problems with mixed-language text.