My code :
Pattern pattern = Pattern.compile("a?");
Matcher matcher = pattern.matcher("ababa");
while(matcher.find()){
System.out.println(matcher.start()+"["+matcher.group()+"]"+matcher.end());
}
Output :
0[a]1
1[]1
2[a]3
3[]3
4[a]5
5[]5
What I know :
- “a?” stands for zero or one occurrence of the character ‘a’.
Java API says :
- matcher.start() returns the start index of the previous match.
- matcher.end() returns the offset after the last character matched.
- matcher.group() returns the input subsequence matched by the previous
match. For a matcher m with input sequence s, the expressions
m.group() and s.substring(m.start(), m.end()) are equivalent. And for
some patterns, for example a*, match the empty string. This method
will return the empty string when the pattern successfully matches
the empty string in the input.
What I want to know:
- In which situations does the regex engine encounters a zero
occurrence of a given character(s) – Here for character ‘a’. - In those situation what are values actually returns by the start(),
end() and group() methods in the matcher. I have mentioned what the
java API said. But I’m little unclear when it comes to the practical
situation as above.
The
?is a greedy quantifier, therefore it will first try to match the 1-occurence before trying the 0-occurence. In you string,It is a bit more complicated than that but that is the main idea. When the 1-occurence cannot match, it will then try with the 0-occurence.
As for the values of start, end and group, they will be where the match starts, ends and the group is what has been matched, so in the first 0-occurence match of your string, you get 1, 1 and the emtpy string. I am not sure this really answers your question.