I have the following code:
preg_replace('/[^\w-]/u','.','Bréánná MÓÚLÍN');
Which on server A (PHP 5.3.5) returns:
“Bréánná.Móúlín” (as it should)
However, on server B (PHP 5.2.11) it returns:
“Br..n..M..l.n” (not what what I want at all)
Am I right in thinking that this is down to whether or not PCRE_UCP was set when the whole thing was compiled?
Is there any way of overriding this if this is the case?
Failing that, is there any way of easily replacing such characters with a ‘standard’ equivalent? (Like utf8_decode but more expansive)
I am not sure whether
PCRE_UCPdefined during compilation affectspreg_replace(), but a work-around to your problem is to use the multibyte string functionmb_ereg_replace():PHP 5.2 results: http://codepad.viper-7.com/UnZeyf
EDIT: I originally thought that the multibyte ereg functions supported Unicode character type escapes, but this turns out not to be true. Instead, you need to determine the ranges of characters that you consider “letters”. I used the character ranges from the XML Standard’s definition of
NameCharwith the following Java program to generate the RegExp string (as apparently the multibyte ereg functions do not support Unicode character escape sequences, either):