What’s a good implementation for unescaping numeric HTML/XML entities, e.g. and replacing them with the ASCII equivalent?
Expressed as a unit test:
local orig = "It's the "end" &ok; "
local fixd = unescape(orig) -- Implement this
assert( fixd == "It's the \"end\" &ok;\n" )
Here’s a simple implementation that also handles the core named XML entities:
However, note that this fails for one pathological case: a numeric ampersand entity followed by the text
amp;:We can fix this edge case by handling all entities at once, but the code gets a good bit uglier:
Finally, we can unwrap it for a little more speed: