This question asked how to detect UTF-8 strings – How to detect if have to apply utf8 decode or encode on a string?
The solution was this:
if (preg_match('!!u', $string))
{
// this is utf-8
}
else
{
// definitely not utf-8
}
I haven’t been able to figure out how to breakdown the “!!u” expression. I clicked through all of PHP’s PCRE stuff and might have missed the description for “!” marks and “u”-somethings. I tried running it through perl’s YAPE::Regex::Explain (as seen in Please explain this Perl regular expression) and couldn’t get something that made sense [I’m no perl expert – don’t know if I fed it the right expression/string].
So… how exactly does preg_match('!!u', $string) work?
It’s just an empty regular expression.
!is the delimiter anduis the modfier.As for why it works, from PHP Manual’s description of the
umodifier (emphasis mine):