Let’s say we have a html string like “2 < 4″
How should be determined if it contains any of these extended sequences?
I ‘ve found HTML::Entities on CPAN, but it doesn’t provide ‘check’ method.
Details: fixing ‘truncate’ method in a way to not leave corrupted string like “2 &l” and not to do unnecesary work. It should look like this
$s = HTML::Entities::decode_entities ($s) if $has_ext_chars;
$s = substr ($s, 0, $len - 3) . '...' if length $s > $len;
$s = HTML::Entities::encode_entities ($s, "‚„-‰‹‘-™›\xA0¤¦§©«-®°-±µ-·»") if $has_ext_chars;
How do I determine $has_ext_chars?
A complete list of character entities can be found on the W3C reference.
You have also to match
\&#u?\d+;and\&#x[a-fA-F0-9]+;