Following code:
$string ='۱۲۳۴۵۶۷۸۹۰';
$regex ='@۱@';
preg_match_all($regex,$string,$match);
var_dump($match);
will output:
array(1) {
[0] =>
array(1) {
[0] =>
string(2) "۱"
}
}
but
$regex2 ='@[۱]@';
preg_match_all($regex2,$string,$match);
var_dump($match);
will output
array (size=1)
0 =>
array (size=11)
0 => string '�' (length=1)
1 => string '�' (length=1)
2 => string '�' (length=1)
3 => string '�' (length=1)
4 => string '�' (length=1)
5 => string '�' (length=1)
6 => string '�' (length=1)
7 => string '�' (length=1)
8 => string '�' (length=1)
9 => string '�' (length=1)
10 => string '�' (length=1)
Indeed I want use RegEx like [۱۲۳۴۵۶۷۸۹۰], but the function output strange result with such RegEx’s. I am using PHP 5.4
Try adding the Unicode flag:
The reason for this is because
۱is actually several bytes long. On it’s own, it’s harmless because those exact bytes are either the symbol, or the individual bytes being there coincidentally. However, in a character class any of the individual bytes may match any of the individual bytes in the other characters, which is does because they are close together in the map.