<html><body><script>
var matches = /(\w+)(\s*(\w+))?/.exec("aaa");
alert(matches.length);
alert(typeof(matches[3]));
</script></body><html>
I’m really new to regular expressions, so this may be a very easy question.
The regular expression above /(\w+)(\s*(\w+))?/ matches patterns like “aaa”, “123”, “my_var” or “aaa bbb”, “123 456”, “my_var my_value”.
For an expression like “aaa bbb”, matches = ["aaa bbb", "aaa", " bbb", "bbb"], but for an expression like “aaa”, matches = ["aaa", "aaa", ???, ???]
The first thing that surprised me is that matches.length = 4. I was expecting it to be 2, but I don’t see any document explaining what it should be. How does it work?
And the second thing that surprised me is that the 2 “extra” matches that I got are working different in the 2 browsers I’ve tested this into:
-
In Firefox 3.6.3, matches[2] and matches[3] are undefined.
-
In Internet Explorer 6, matches[2] and matches[3] are an empty string.
Basically, how should I check if I’ve got a “short” (like “aaa”) or a “long” (like “aaa bbb”) expression?
The standard (ECMAScript 5) is pretty clear. The length should be 4, and IE is wrong (shocking, I know).
From §15.10.2.1, “NcapturingParens is the total number of left capturing parentheses.” You have 3.
“A State is an ordered pair (endIndex, captures) where endIndex is an integer and captures is an internal array of NcapturingParens values. […] The nth element of captures is either a String that represents the value obtained by the nth set of capturing parentheses or undefined if the nth set of capturing parentheses hasn’t been reached yet.”
§15.10.6.2, which describes exec, says:
So the length should definitely be 4 (3 + 1), and captures that don’t get reached (like
(\s*(\w+))in your pattern) remain undefined. Luckily, undefined and “” (empty string) are both falsy. This means that they are false when treated as a boolean. So you can work around IE’s bug by doingif(matches[2])