Thanks in advance for your help.
I have a need within an application to remove all HTML Characters and replace them with their HTML number equivalent.
For example:
‡, •, -, ‰, € and ™
Become:
‡, •, -, ‰, € and ™
There are lot’s of questions currently out there, but these do it the other way round.
I have all of the chars I want to convert in a JSON object (this is just a snapsshot of a much larger list, just to prove my JSON is good):
{"ch":"‘","sub":"‘"},
{"ch":"’","sub":"’"},
{"ch":"‚","sub":"‚"},
{"ch":"“","sub":"“"},
{"ch":"”","sub":"”"},
{"ch":"„","sub":"„"},
{"ch":"†","sub":"†"},
{"ch":"‡","sub":"‡"},
{"ch":"•","sub":"•"},
...
And I currently loop through (using Prototype here) and attempt to replace them:
oJSONItems.each(function(o){
var oRG = new RegExp(o.ch,'g');
oText = oText.replace(oRG,o.sub);
});
Some are being replaced, but some are not…
‡
•
-
‰
€
™
More than anything I need to know why chars like ™ are failing to be converted.
Thanks.
Rather than code for specific entities, how about one that replaces anything outside the original 7 bit ASCII range:
(The regexp matches anything that’s not white space or a “normal” ASCII character)
Alternatively, write your map so that the keys are the characters you want to replace, and the values are the entities:
Note that both versions only make the regexp call once, rather than once for each possible entity in your set. This should make the code somewhat faster than your loop-based method.