I have to process a string that could include all sorts of non-standard characters and I’ve been asked to provide a regular expression that will match and remove all characters that are non-alphanumeric except punctuation and spaces.
Is there a way to do this?
From regular-expressions.info:
\p{P}or\p{Punctuation}: any kind of punctuation character.\p{L}or\p{Letter}: any kind of letter from any language.\p{Nd}or\p{Decimal_Digit_Number}: a digit zero through nine in any script except ideographic scripts.Your regex would then look like this
This would match anything that is not a letter, not a digit, not punctuation and not a space.