In my application client is uploading data from MS word to Textarea. My RegEx skills are not so good 🙂
I need a RegEx to filter all the junk characters from string and the only acceptable input is characters from keyboard.
i.e, A-Z, a-z, 0-9 and all the special chracters present on keyboard + all currency symbols.
EDIT: I want to allow only ascii codes including extended. http://www.asciitable.com/
I have checked the ASCII table and all printable symbols it contains are present on any standard keyboard.
It’s hard to tell what defines “special characters present on the keyboard” but I assume you mean printable non-alphanumeric characters. While all the unicode whitespace characters (non-braking space, zero-width word non-joiner…) are indeed “special”, they are absent from most keyboards. The backspace character, while present on most keyboards, is typically interpreted by the OS, so I assume you don’t want that. A similar argument applies to the tab key: while the tab character is easier to obtain than the newline character, it can’t normally be typed into a form input.
Concerning currency symbols, the character class
\p{Sc}covers them, and C# regex seems to support this classNon-US keyboards contain many more characters (symbols with diacritics, cyrillic, chinese/japanese/korean characters), but they don’t match your description of “A-Z, a-z, 0-9 and all the special chracters present on keyboard + all currency symbols”. Of special interest is the japanese end-of-sentence punctuation, which is a hollow circle instead of just a dot. However, while it matches your description, I believe you don’t want that either.
C# also supports
\p{isBasicLatin}, but that includes the ASCII control characters, which I assume you don’t want.To sum up: your description matches the entire printable ASCII range and the newline
\n. To check a string is made out of these, use this regex:Reflecting your edit, also consider all printable ASCII characters (most currency symbols are absent,
$isn’t) + newlineor the entire ASCII range including the control characters and all ASCII whitespace:
Ref:
MSDN character classes
MSDN character escapes
MSDN code example (adapted here):
regex replace (adapted here; strips out
everything except A-Z, a-z , 0-9 and following characters. ~ ` ! @ # $ % ^ & * ( ) _ + | - = \ { } [ ] : " ; ' < > ? , . /)Concerning double quotes inside verbatim string literals: http://blogs.msdn.com/b/gusperez/archive/2005/08/10/450257.aspx