What I am looking for is to mimic behavior of C# regular expression (that I really like) in Java’s regular expression API in a most easy-to-use way.
Basically, C# allows you to catch closures of “nested” groups in regular expression by allowing you to access Captures attribute of a Group object instances inside Matches. Description of the feature is here: MSDN.
For example, look at the code below:
public static void main(String[] args) {
Pattern pattern = Pattern.compile("(abc((([\\d]+)\\s?)+)def\\s?)+?");
Matcher matcher = pattern.matcher("abc123def abc567 341 123 789def");
while(matcher.find()) {
System.out.println(matcher.group(3));
}
}
Output in Java is:
123
789
So, as you can see Java can not see other captures except (last) 789. In C# you would be able to see 567, 341, 123 and 789 inside Captures attribute.
Unfortunately I see that in Java I have access only to one capture of the nested group ([\\d]+), I can’t see to find a way to catch the other captures of the nested group. The question is basically “am I missing something here?“.
I know that I can just use another regex matcher on the match string of a larger group. I wish though to have it all in a single big RegEx which is filled with comments and easy to test in a (pretty cool) tool “Regulator v2”. I also know that the upper example can be done without the nested group, but this is just a rough example based on a real-life log parser RegEx with >20 groups, just to explain what is the problem.
EDIT: I introduced entire Java example to avoid misinterpretation of the problem.
It is not possible to achieve this in java.
Java
Matcherclass will return last match of subgroup within each match.That is:
For regex:
(\w(\d))+on stringa1b2c3the returned groups will be [“a1b2c3”, “c3”, “3”].
If the regex is changed to
(\w(\d))then it will return matches:["a1", "a1", "1"],["b2", "b2", "2"],["c3", "c3", "3"]