For example this:
<!-- All the characters are going to be converted into a Hex values depending the encoding used -->
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <!-- It Just interpret the Hex values that are going to be displayed -->
<?php
/* PHP Strings are bytestream */
/* PHP treat the strings as a Hex values from the econding used */
$string="€"; // Hex value from the Encoding Method(UTF-8). [U+20AC][E2|82|AC]
if(preg_match('/\xE2\x82\xAC/',$string,$m)){
echo "Match<br>";
print_r($m);
}
else{
echo "Don't Match";
}
?>
As long as you use correct bytes secuences to Match Unicode Characters.
Is not needed to use Unicode Support?
or is that I’m thinking wrong?
For that particular match, you don’t need Unicode support. Any simple direct string match will work for two UTF-8 strings—that was a deliberate design feature of UTF-8—but then you wouldn’t be using regex if all you needed was a direct string match: for your example you’d be better off with
strpos.Many other regex features will behave unexpectedly without Unicode support. For example:
with Unicode support, that’s multiple € signs (
\xE2\x82\xAC\xE2\x82\xAC\xE2\x82\xAC...). Without it, that’s the first two bytes of a € symbol then any number of 0xAC bytes (\xE2\x82\xAC\xAC\xAC\xAC...), so the only valid UTF-8 sequence it would match would be a single €.with Unicode support, matches
xor a euro. Without Unicode support, matchingxor the byte 0xE2 or the byte 0x82 or the byte 0xAC.And so on.