now before you prepare to right a speech about the perils of HTML parsing

Question

0

Asked: May 19, 20262026-05-19T00:30:42+00:00 2026-05-19T00:30:42+00:00

now before you prepare to right a speech about the perils of HTML parsing

0

now before you prepare to right a speech about the perils of HTML parsing with regex, I already know it. This is more just a curiosity question, than actually wanting to know the question for practical usage.

Basically, given a file of HTML in some random, but perfectly valid format, can you parse out the content of <p> tags using a half-sane number of regular expressions? (and also pretending that <p> tags can not be nested or some other minor limitation)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-19T00:30:43+00:00

It’s certainly possible to extract all the text between {insert character sequence 1 here} and {insert character sequence 2 here} with regular expressions, so long as those sequences aren’t overlapping. For example:

/(?<{insert character sequence 1 here}).*?(?={insert character sequence 2 here})/

Of course, it’s terribly brittle and will break horribly if what you’re running it on is even slightly malformed, or contains either character sequence outside the context where it’s meaningful, or any number of other ways. If you oversimplify the problem, then yes you can get away with an oversimplified solution.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

now before you prepare to right a speech about the perils of HTML parsing

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply