I am having this particular requirement where a method has to be identified by different regular expressions for different components. For example, there need to be a regex for return parameter, one for method name, one for argument type and one for argument name. I was able to come up with an expression till this step as follows –
([^,]+) ([^,]+)\((([^,]+) ([^,]+))\)
It works well for a method signature like –
ReturnType foo(Arg parameter)
The regular expression identifies ReturnType, foo, Arg and parameter separately.
Now the problem is that a method can have no/one/multiple arguments separated by commas. I am not able to get a repeating expression for this. Help will be appreciated.
Let’s abstract this out a bit, and say we want to match a (possibly empty) list of digits separated by commas.
The pattern is therefore
Now you can try to replace the components to match what you want:
\d+, whatever regex you use to match type name and identifier\s*around the commaNote that if you allow generic type parameters, then you definitely can’t use regex since you can nest the
<...>and the language of balanced balanced parentheses of arbitrary depth is not regular.Although you can argue that in practice, no one would ever nest type parameters deeper than, say, 3 levels, so then it becomes regular again.
That said, a proper parser is really the best tool for this. Just look for implementation of Java grammar, say, in ANTLR.
See also