I have a fairly simple regex problem for a little personal experiment that I haven’t quite figured out.
In a string, I might have several <tag>[some characters here] that I need to match. The obvious way to do it would be with a /<tag>\[.*?\]/ regex, to match any characters after the <tag>[ and before the ].
I’d like to be able to have <tag>s within <tag>s, however. This causes a problem. If I had the following:
<tag>[some characters <tag>[in here] to match]
the regex would stop matching as soon as it reached the first closing-bracket, and completely fail to match the last part of the statement. I’ve tried to solve the problem by telling the regex to ignore any internal <tag>s, so I can do a match on the stripped contents later. I haven’t quite gotten it working. The closest I’ve come is:
/<tag>\[(.*?(?:<tag>\[.*?\])*?.*?)\]/
which doesn’t quite work. I would hope that it would match any number of characters, and any inner tags if they exist. It still has trouble with that first closing bracket, however.
Maybe somebody who’s better at regular expressions knows a good solution to this.
Though you should probably drop regex and do this manually if the mini-language becomes more complex, you can use recursive regex.
Your regex would look something like this:
You can see it in action here: http://rubular.com/r/9F7isgZpj9
Here is the regex broken down to its parts: