I’m trying to create a regular expression that matches words with the following condition:
- Match words which can contain characters like: æøå, and numbers.
- If a word contains any of the following characters, it is invalid:
+ - & | ! ( ) { } [ ] ^ " ~ * ? : \
So for example these words are okay:
testæøå
test12
12test
But these should fail:
t+st
te&st
Just in case you don’t know, regex in C# is much slower than string manipulation:
Regex in C#
Yet, you can increase the speed if you optimize it using Regex.Compiled. This does cause your program to start up slower, however. If this is going to be any sort of web-based (C#/Silverlight), I highly recommend using String manipulation and searching over Regex, as it is going to be incredibly-slow for anyone using the page otherwise.
You can easily match Unicode or ASCII codes of characters and accept/deny words from there, with much better performance.
If you are determined to use regex, consider Perl, or other scripting languages, that are much faster with string manipulation using Regex.