We’re currently implementing a little tag system into our software. There are just two

Question

0

Asked: May 29, 20262026-05-29T07:19:37+00:00 2026-05-29T07:19:37+00:00

We’re currently implementing a little tag system into our software. There are just two

0

We’re currently implementing a little tag system into our software. There are just two different tag styles: single ones and multiple ones.

The single ones look like this:

<<Single_Tag>>

The multiple ones look like this:

<<Multiple_Tag*>>
... stuff between tag ...
<</Multiple_Tag*>>

The RegEx to find the single ones would be:

<<\w+>>

The RegEx to find the multiple ones would be:

<<(\w+)\*{1}>>((.|\s)*)<</(\w+)\*{1}>>

Are the {1}‘s required? Am I right, that (.|\s)*needs to be greedy? Otherwise this RegEx would fail on:

<<multiple_tag1*>>
    <<multiple_tag2*>>

    <</multiple_tag2*>>
<</multiple_tag1>>

Is there maybe an easier way with capturing groups? Excuse me, if the following syntax is wrong. The last time I’ve used RegEx is years ago:

<<(\w+)\*{1}>>((.|\s)*)<</($1)\*{1}>>

That $1stands for the first capturing group. I’m developing in .NET. I checked these on RegExr, already. But I just remember: it’s very easy to overlook something while working with RegEx.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T07:19:39+00:00

See the following post about parsing html with regex as it applies to this as well (my fav. ever stack-overflow post).

RegEx match open tags except XHTML self-contained tags

Update

One way of solving this is to:

1) Build a tokenizer that tokenizes your input into sequence of tokens where each token is one of:

* Non-Tag (contains all the content)
* Open-Tag (contains the name of the tag)
* Close-Tag  (contains the name of the tag)

2) Call the tokenizer in a loop, and manualy keep count of the opening closing tags, making sure that they balance correctly.

Step (1) could be automated with a lexer generator. In theroy step (2) could be automated by a parser generator, but this may be overkill in this case.

A common lexer and parser generator used in .NET is ANTLR

Example

This input

<<Multiple_Tag*>>
... stuff between tag ...
<</Multiple_Tag*>>

Would generate the following tokens:

 1. Open-Tag("Multiple_Tag")
 2. Non-Tag("\n    ... Stuff between tag ... \n")
 3. Close-Tag("Multiple_Tag")

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

We’re currently implementing a little tag system into our software. There are just two

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply