The Ruby (1.9.3) documentation seems to imply that scan is equivalent to =~ except that
- scan returns multiple matches, while =~ returns only the first occurrence, and
- scan returns the match data, while =~ returns the index.
However, in the following example, the two methods seem to return different results for the same string and expression. Why is that?
1.9.3p0 :002 > str = "Perl and Python - the two languages"
=> "Perl and Python - the two languages"
1.9.3p0 :008 > exp = /P(erl|ython)/
=> /P(erl|ython)/
1.9.3p0 :009 > str =~ exp
=> 0
1.9.3p0 :010 > str.scan exp
=> [["erl"], ["ython"]]
If the index of first match is 0, shouldn’t scan return “Perl” and “Python” instead of “erl” and “python”?
Thanks
When given a regular expression without capturing groups,
scanwill return an array of strings, where each string represents a match of the regular expression. If you usescan(/P(?:erl|ython)/)(which is the same as your regex except without capturing groups), you’ll get["Perl", "Python"], which is what you expect.However when given a regex with capturing groups,
scanwill return an array of arrays, where each sub-array contains the captures of a given match. So if you have for example the regex(\w*):(\w*), you’ll get an array of arrays where each sub-array contains two strings: the part before the colon and the part after the colon. And in your example each sub-array contains one string: the part matched by(erl|ython).