I need to remove all non Letter and Mark (Unicode categories) characters from a string. Currently I’m splitting and subsequently joining a string like so:
text.split("[\\p{P} \\t\\n\\r]")
My RegEx is however… acutely inadequate. Please help.
EDIT
I think this will work:
text.split("[\\P{M}\\P{L}]")
Try this:
See more in http://www.regular-expressions.info/unicode.html