I have the following text string and regex pattern in a c program:
char text[] = " identification division. ";
char pattern[] = "^(.*)(identification *division)(.*)$";
Using regexec() library function, I got the following results:
String: identification division. Pattern: ^(.*)(identification *division)(.*)$ Total number of subexpressions: 3 OK, pattern has matched ... begin: 0, end: 37,match: identification division. subexpression 1 begin: 0, end: 8, match: subexpression 2 begin: 8, end: 35, match: identification division subexpression 3 begin: 35, end: 37, match: .
I was wondering since the regex engine matches in a greedy fashion and the first capture group (.*) matches any number of characters (except new line characters) why doesn’t it match characters all the way to the end in the text string (up to ‘.’) as oppose to matching only the first 8 spaces?
Does each capture group have to be matched?
Are there any rules on how the capture group matches the text string?
Thanks.
Just as you said, if the greedy group (.*) had consumed the whole string, the rest of the regex wouldn’t have anything to match which wouldn’t make your regex match the string. So, yes, each capture group (and other pattern parts) needs to be matched. This is exactly what you specified in your regex.
Try the following string instead and run the code with both a reluctant and a greedy first group and you will see the difference.