Before somebody points me to that question, I know that one can’t parse html with regex 🙂 And this is not what I am trying to do.
What I need is:
Input: a string containing html.
Output: replace all opening tags
***<tag>
So if I get
<a><b><c></a></b></c>, I want
***<a>***<b>***<c></a></b></c>
as output.
I’ve tried something like:
(<[~/].+>)
and replace it with
***$1
But doesn’t really seem to work the way I want it to. Any pointers?
Clarification: it’s guaranteed that there are no self closing tags nor comments in the input.
You just have two problems:
^is the character to exclude items from a character class, not~; and the.+is greedy, so will match as many characters as possible before the final>. Change it to:You can also probably drop the parentheses and replace with
$0or$&, depending on the language.