I was working on a parser that could read HTML however the code that splits it causes “l”s to be inserted in every other entry of the produced array.
The regexp is this:
textarea.value.split(/(?=<(.|\n)+>)/)
What it’s supposed to do is split entry/exit/single HTML/XML tags while ignoring tabs and line terminators (it just appends them to tags they were split with)
May I have some insite as to what’s happening?
You can view code in action and edit here:
http://jsfiddle.net/termtm/ew7Mt/2/
Just look in console for result it produces.
EDIT: MaxArt is right the l in last <html> causes the anomalies to be “l”s
Try this:
But… what Alnitak said. A fully-fledged HTML parser based on regexps, expecially with the poor feature support of regexps in Javascript, would be a terrible (and slow) mess.
I still have to find out the reason of the odd behaviour you found. Notice that “l” (ell) is the last letter of
"<html>", i.e., the first tag of your HTML code. Change it to something else and you’ll notice the letters change.