How can I remove characters, like punctuation, commas, dashes etc from a string, in a multibyte safe manner?
I will be working with input from many different languages and I am wondering if there is something that can help me with this
Thanks
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
There are the unicode character class thingys that you can use:
To match any non-letter symbols you can just use
\PL+, the negation of\p{L}. To not remove spaces, use a charclass like[^\pL\s]+. Or really just remove punctuation with\pP+Well, and obviously don’t forget the regex
/umodifier.