I’ve been playing around with the preg_ functions for a while now with no luck. I tried removing segments of a string, I tried taking them out and putting them back together etc., without luck.
I have an array of allowed characters or segments, and I simply want to remove anything from a string that isn’t in this array. How can I do this without ruining the structure of the string?
This is what I would expect it to do:
$allowed = array('<', '>', 'p', 'sc');
echo clean('<script>'); // <scp>
Bonus question: Should I use mb_ereg_match to ensure UTF-8 works aswell?
Thanks in advance.
Removing everything but a set of characters is easily done with an expressen such as
[^a-c], which matches everything but the lower-case charactersa, b, c. For character sequences (like yoursc) this will of course not work.But if you know which characters you want to keep, you could turn the game around. Extract what you want to keep, ignore the rest:
PCRE can do UTF-8 with the
/uflag. mb_ereg_* are slower than PCRE and should only be used when dealing with Charset other than UTF-8 ISO-8859-1may just as well be
the latter is probably a teeny bit faster…