Can anyone explain:
- Why the two patterns used below give different results? (answered below)
- Why the 2nd example gives a group count of 1 but says the start
and end of group 1 is -1?
public void testGroups() throws Exception
{
String TEST_STRING = "After Yes is group 1 End";
{
Pattern p;
Matcher m;
String pattern="(?:Yes|No)(.*)End";
p=Pattern.compile(pattern);
m=p.matcher(TEST_STRING);
boolean f=m.find();
int count=m.groupCount();
int start=m.start(1);
int end=m.end(1);
System.out.println("Pattern=" + pattern + "\t Found=" + f + " Group count=" + count +
" Start of group 1=" + start + " End of group 1=" + end );
}
{
Pattern p;
Matcher m;
String pattern="(?:Yes)|(?:No)(.*)End";
p=Pattern.compile(pattern);
m=p.matcher(TEST_STRING);
boolean f=m.find();
int count=m.groupCount();
int start=m.start(1);
int end=m.end(1);
System.out.println("Pattern=" + pattern + "\t Found=" + f + " Group count=" + count +
" Start of group 1=" + start + " End of group 1=" + end );
}
}
Which gives the following output:
Pattern=(?:Yes|No)(.*)End Found=true Group count=1 Start of group 1=9 End of group 1=21
Pattern=(?:Yes)|(?:No)(.*)End Found=true Group count=1 Start of group 1=-1 End of group 1=-1
To summarise,
1) The two patterns give different results because of the precedence rules of the operators.
(?:Yes|No)(.*)Endmatches (Yes orNo) followed by .*End
(?:Yes)|(?:No)(.*)Endmatches (Yes)or (No followed by .*End)
2) The second pattern gives a group count of 1 but a start and end of -1 because of the (not necessarily intuitive) meanings of the results returned by the
Matchermethod calls.Matcher.find()returns true if a match was found. In your case the match was on the(?:Yes)part of the pattern.Matcher.groupCount()returns the number of capturing groups in the pattern regardless of whether the capturing groups actually participated in the match. In your case only the non capturing(?:Yes)part of the pattern participated in the match, but the capturing(.*)group was still part of the pattern so the group count is 1.Matcher.start(n)andMatcher.end(n)return the start and end index of the subsequence matched by the n th capturing group. In your case, although an overall match was found, the(.*)capturing group did not participate in the match and so did not capture a subsequence, hence the -1 results.3) (Question asked in comment.) In order to determine how many capturing groups actually captured a subsequence, iterate
Matcher.start(n)from 0 toMatcher.groupCount()counting the number of non -1 results. (Note thatMatcher.start(0)is the capturing group representing the whole pattern, which you may want to exclude for your purposes.)