Possible Duplicate:
Regex question mark
I am trying to figure out how to parse out the text inside the parenthesis but between the single quotes in a statement. For example, if I have the following statement:
(I have a 'cat', 'hat');
I want the result to be
cat
hat
I managed to figure it out by experimenting with the different metacharacters defined in wikipedia (http://en.wikipedia.org/wiki/Regular_expression) however I still have trouble understanding why it works.
I’ve tried this : \'(.*)\'
My understanding of this regex: I want to get the characters in between the single quotes ' and these characters matches any single character . zero or more times *.
This resulted in:
cat', 'hat
After playing around with a bunch of regex I finally ended up with this by accident: \'(.*?)\'
This resulted in:
cat
hat
Why does this work? (In particular I don’t understand how the ‘?’ works.)
The default behavior of the regular expression to make the longest possible match in the string. This is referred to as being “greedy.”
You are correct that
?normally just means to match the preceding item (e.g., character) one or more times, but*?is a special case called “lazy star” that switches the regular expression evaluator into a “lazy” mode. In this mode, the evaluator first tries to skip the preceding item (and complete a match without it), before then “going back” for it.The net result is just what you’ve observed: it will match shorter strings that satisfy the search criteria instead of (the default) whereby it would just find the longest possible match.
A handy resource for testing out regular expressions is here, and a nice description of the various options, including lazy star, is here.