I have a MySQL database, being fed data from a PHP powered form. The table columns are collated as utf8_bin, the connection charset is set at utf8, as is the HTML.
After extensive Googling, I cannot seem to find any clear way of using preg_replace to strip unwanted characters (and numbers) but keep upper/lowercase accents, umlauts and spaces. I’ve cobbled together something that seems to work – but I don’t understand it at all, so have no idea how secure it is. Hence the doubling up with the escape clause:
$lname = preg_replace("/(<\/?)(\w+)([^>]*>)/e","", $lname);
$lname = mysql_real_escape_string($lname);
What I really need is the kind of clause that could take the following name (mine, as an example): “Éamonn Mac Lochlainn” and store it as such, rather than “c389616d6f6e6eMacLochlainn” I’ve looked at strip_tags also, allowing “ÁÉÍÓÚáéíóú”. Is that the way forward?
Any help – and, in particular, explanations of what’s going on in this snippet (the \w+ bits)- would be greatly appreciated.
\wis a word character according to the current locale. If that is set correctly for all the data: no problem. If your locale is not enough, you could say all letters & whitespace are valid:For more information about
\w, see Escape sequencesFor more information about unicode properties (the
\pin combination with the/uswitch), see Unicode PropertiesYou seem to do a bit more then just validating characters, also stripping HTML tags.
strip_tagswould work for this indeed (do it before the replace).