I was just reviewing a previous post I made and noticed a number of

Question

0

Asked: May 27, 20262026-05-27T15:05:55+00:00 2026-05-27T15:05:55+00:00

I was just reviewing a previous post I made and noticed a number of

0

I was just reviewing a previous post I made and noticed a number of people suggesting that I don’t use Regex to parse xml. In that case the xml was relatively simple, and Regex didn’t pose any problems. I was also parsing a number of other code formats, so for the sake of uniformity it made sense. But I’m curious how this might pose a problem in other cases. Is this just a ‘don’t reinvent the wheel’ type of issue?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T15:05:56+00:00

The real trouble is nested tags. Nested tags are very difficult to handle with regular expressions. It’s possible with balanced matching, but that’s only available in .NET and maybe a couple other flavors. But even with the power of balanced matching, an ill-placed comment could potentially throw off the regular expression.

For example, this is a tricky one to parse…

<div>
    <div id="parse-this">
        <!-- oops</div> -->
        try to get this value with regex
    </div>
</div>

You could be chasing edge cases like this for hours with a regular expression, and maybe find a solution. But really, there’s no point when there are specialized XML, XHTML, and HTML parsers out there that do the job more reliably and efficiently.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I was just reviewing a previous post I made and noticed a number of

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply