I have a form where an user can post a global notice into the system (for other users to see).
The system outputs HTML directly from the DB (when a user wanto to see a notice).
I’d like to allow some html tags to stay intact and to have the rest of them with htmlspecialchars() applied.
I already tried to apply
str_replace($search, $replace, htmlspecialchars($str))
strategy but it seems to be really slow. Too slow, actually. And also it’s not safe that will always work, Is there an alternative for this?
I wanted something that did the strip_tags() job except that it, instead of striping tags it would apply htmlspecialchars to the not allowed tags.
ADD(ed) info (by request):
$str can be any size you can think of. I thought of using a big string (1M characters (generated rendomly with some allowed and some unallowed tags inside. All tags had attributes) for the reason of testing one of the worst case scenarios With the logic: If it works like this, it should work for simpler cases.
The server took 5s to process the complete str_replace (with htmlspecialchars). This test was made in my computer that has 2GHz CPU and DDR3 RAM.
both $search and $replace have a total of 7 replacements. Still they do not always work. In some cases $search gives false positives or false negatives.
To clarify, I apply these changes while saving to the DB and not while retrieving from the DB.
You might try this code (should be improved):
Regular expression looks (should look) for 2 types of strings:
<tag attributes>content</tag>, withtagpart being the same for openingan closing tag, and
attributesandcontentbeing optional<tag attributes/>, withattributesbeing optionalTags are listed in
(i|a)part for<tag></tag>types of tags and(?:img)for<tag/>types of tags.If it finds matching tags, it passes content to
callback()function which converts it back by usinghtmlspecialchars_decode(). This is necessary for decoding quotes and other encoded characters in the list of attributes.I’m not sure if it works in all cases, i.e., if it matches all necessary tags. If this works in general, then pattern and
callback()function should be improved so thatcallback()decodes only<,>characters and list of attributes; content of tags (i.e.,some linkpart in<a href='#'>some link</a>) must not be decoded.