I am trying to find a regular expression which will do the following (working in Javascript). I want to take a string which contains some tokens like (token) inside parentheses. My aim is to capture the tokens (including the parentheses). I will assume that parenthese are not nested, and that every open parenthesis is eventually closed.
The regular expression I would use is
[[^\(\)]*|(\(.*?\))]*
Let me break it down:
[ # Either of two things:
[^\(\)]* # the first is a substring not containing parentheses
|
( # the second is to be captured...
\(.*?\) # and should contain anything in parentheses - lazy match
)
]* # Any number of these blocks can appear
Needless to say, this will not work (why would I be asking here otherwise?):
var a = /[[^\(\)]*|(\(.*?\))]*/;
a.exec('foo(bar)');
It fails both in Firefox and Node. My previous attempt was a slightly more compicated regex:
(?:[^\(\)]*(\(.*?\)))*[^\(\)]*
which can be described as follows
(?: # A non-capturing group...
[^\(\)]* # ...containing any number of non-parentheses chars
(\(.*?\)) # ...followed by a captured token inside parentheses.
)* # There can be any number of such groups
[^\(\)]* # Finally, any number of non-parentheses, as above
This will work on foo(bar), but will fail on foo(bar)(quux), catpuring only quux.
How should I fix the above regex?
You can’t have an arbitrary amount of capture groups in a regex. use the /g flag to accomplish this instead:
s.match(/\([^\)]+\)/g)