I’m looking for an efficient means of extracting an html fragment from an html

Question

0

Asked: May 18, 20262026-05-18T10:43:06+00:00 2026-05-18T10:43:06+00:00

I’m looking for an efficient means of extracting an html fragment from an html

0

I’m looking for an efficient means of extracting an html “fragment” from an html document. My first implementation of this used the Html Agility Pack. This appeared to be a reasonable way to attack this problem, until I started running the extraction on large html documents – performance was very poor for something so trivial (I’m guessing due to the amount of time it was taking to parse the entire document).

Can anyone suggest a more efficient means of achieving my goal?

To summarize:

For my purposes, an html “fragment”
is defined as all content inside of
the <body> tags of an html
document
Ideally, I’d like to return the
content unaltered if it didn’t
contain an <html> or <body>
(I’ll assume I was passed an html
fragment to begin with)
I have the entire html document available in memory (as a string), I won’t be streaming it on demand – so a potential solution won’t need to worry about that.
Performance is critical, so a potential solution should account for this.

Sample Input:

<html>
   <head>
     <title>blah</title>
   </head>
   <body>
    <p>My content</p>
   </body>
</html>

Desired Output:

<p>My content</p>

A solution in C# or VB.NET would be welcome.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-18T10:43:07+00:00

Editorial Team

2026-05-18T10:43:07+00:00Added an answer on May 18, 2026 at 10:43 am

Most html is not going to be XHTML compliant. I would do an HTTP get request and search the resultant text for .Contains("<body>") and .Contains("</body>"). You can use these two locations as your start and stop indexes for a reader stream. Outside the body tag you really don’t need to worry about XML compliance.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m looking for an efficient means of extracting an html fragment from an html

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply