The following Java code is intended to capture the word “abc”, but instead it gives “null”:
Pattern p = Pattern.compile("^.*(\\ba\\w*\\b)?.*$");
Matcher m = p.matcher("xxx abc yyy");
if (m.matches()) System.out.println(m.group(1));
If you remove the question mark, it correctly captures “abc”. The question mark is greedy, so I would have thought the original code should also give “abc”.
Thank you to anyone who can explain why!
The
.*at the beginning of your regex is greedy, so it will initially try to match as many characters as possible (the entire string). As the regex engine moves onto the capturing group, it sees that\ba\w*\bcannot match at the end of the string, but because the group is optional it will not backtrack and try to find a match.To fix this, just change the
.*at the beginning to.*?, which will still match zero or more characters but it will try to match as few as possible (lazy instead of greedy):The other alternative would be to make your capturing group required by removing the
?after it. This would force the regex engine to backtrack until the group match is made. This probably isn’t what you want though, as it would change the meaning of the regex (fewer strings would be matched).edit: Looks like I really should have tested this! As it turns out just changing the
.*to.*?doesn’t help here, because your group still can’t match at the beginning, and the entire string will be matched by the.*at the end (even if you change it to.*?).Your best bet here is to just remove the
?after the group so that the group is required. If you still want to match all strings but have the group be null for strings that don’t match your group, you could use the following regex: