Some Context
From Javascript: The Definitive Guide:
When
regexpis a global regular expression, however,exec()behaves in a slightly more complex way. It begins searchingstringat the character position specified by thelastIndexpreperty ofregexp. When it finds a match, it setslastIndexto the position of the first character after the match.
I think anyone who works with javascript RegExps on a regular basis will recognize this passage. However, I have found a strange behavior in this method.
The Problem
Consider the following code:
>> rx = /^(.*)$/mg >> tx = 'foo\n\nbar' >> rx.exec(tx) [foo,foo] >> rx.lastIndex 3 >> rx.exec(tx) [,] >> rx.lastIndex 4 >> rx.exec(tx) [,] >> rx.lastIndex 4 >> rx.exec(tx) [,] >> rx.lastIndex 4
The RegExp seems to get stuck on the second line and doesn’t increment the lastIndex property. This seems to contradict The Rhino Book. If I set it myself as follows it continues and eventually returns null as expected but it seems like I shouldn’t have to.
>> rx.lastIndex = 5 5 >> rx.exec(tx) [bar,bar] >> rx.lastIndex 8 >> rx.exec(tx) null
Conclusion
Obviously I can increment the lastIndex property any time the match is the empty string. However, being the inquisitive type, I want to know why it isn’t incremented by the exec method. Why isn’t it?
Notes
I have observed this behavior in Chrome and Firefox. It seems to happen only when there are adjacent newlines.
[edit]
Tomalak says below that changing the pattern to /^(.+)$/gm will cause the expression not to get stuck, but the blank line is ignored. Can this be altered to still match the line? Thanks for the answer Tomalak!
[edit]
Using the following pattern and using group 1 works for all strings I can think of. Thanks again to Tomalak.
/^(.*)((\r\n|\r|\n)|$)/gm
[edit]
The previous pattern returns the blank line. However, if you don’t care about the blank lines, Tomalak gives the following solution, which I think is cleaner.
/^(.*)[\r\n]*/gm
[edit]
Both of the previous two solutions get stuck on trailing newlines, so you have to either strip them or increment lastIndex manually.
[edit]
I found a great article detailing the cross browser issues with lastIndex over at Flagrant Badassery. Besides the awesome blog name, the article gave me a much more in depth understanding of the issue along with a good cross browser solution. The solution is as follows:
var rx = /^/gm, tx = 'A\nB\nC', m; while(m = rx.exec(tx)){ if(!m[0].length && rx.lastIndex > m.index){ --rx.lastIndex; } foo(); if(!m[0].length){ ++rx.lastIndex; } }
The problem is that the dot in
does not match new line characters, but with your
'm'switch you make'^'and'$'anchor to new line characters. That means the ‘nothing’ between'\n'and'\n'can be matched successfully with'(.*)'.Since this match is of zero width, the
lastIndexproperty cannot advance. Try:EDIT: To match the blank lines as well, do this:
or
…and just go for match group 1.