I have text that looks like:
My name is (Richard) and I cannot do
[whatever (Jack) can’t do] and
(Robert) is the same way [unlike
(Betty)] thanks (Jill)
The goal is to search using a regular expression to find all parenthesized names that occur anywhere in the text BUT in-between any brackets.
So in the text above, the result I am looking for is:
- Richard
- Robert
- Jill
You didn’t say what language you’re using, so here’s some Python:
The output is:
One caveat is that this does not work with arbitrary nesting. The only nesting it’s really designed to work with is one level of parens in square brackets as mentioned in the question. Arbitrary nesting can’t be done with just regular expressions. (This is a consequence of the pumping lemma for regular languages.)
The regex looks for chunks of text without brackets or parens, chunks of text enclosed in parens, and chunks of text enclosed in brackets. Only text in parens (not in square brackets) is captured. Python’s
findallfinds all matches of the regex in sequence. In some languages you may need to write a loop to repeatedly match. For non-paren matches,findallinserts an empty string in the result list, so the call tofilterremoves those.