I have a set of characters like
., !, ?, ;, (space)
and a string, which may or may not be UTF 8 (any language).
Is there a easy way to find out if the string has one of the character set above?
For example:
这是一个在中国的字符串。
which translates to
This is a string in chinese.
The dot character looks different in the first string. Is that a totally different character, or the dot correspondent in utf 8?
Or maybe there’s a list somewhere with Unicode punctuation character codes?
In Unicode there are character propertiesPHP Docs, for example Symbols, Letters and the like. You can search for any string of a specific class with
preg_matchDocs and theumodifier.However, your string needs to be
UTF-8to do that.You can test this on your own, I created a little script that tests for all properties via
preg_match:Related: PHP – Fast way to strip all characters not displayable in browser from utf8 string.