I have a source to a web page and I need to extract the

Question

0

Editorial Team

Asked: May 25, 20262026-05-25T20:49:53+00:00 2026-05-25T20:49:53+00:00

I have a source to a web page and I need to extract the

0

I have a source to a web page and I need to extract the body. So anything between </head><body> and </body></html>.

I’ve tried the following with no success:

var match = Regex.Match(output, @"(?<=\</head\>\<body\>)(.*?)(?=\</body\>\</html\>)");

It finds a string but cuts it off long before </body></html>. I escaped characters based on the RegEx cheat sheet.

What am i missing?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T20:49:54+00:00

Editorial Team

2026-05-25T20:49:54+00:00Added an answer on May 25, 2026 at 8:49 pm

I’d recommend using the HtmlAgilityPack instead – parsing HTML with regular expressions is very, very fragile.

The latest version even supports Linq so you can get your content like this:

HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load("http://stackoverflow.com");
string html = doc.DocumentNode.Descendants("body").Single().InnerHtml;

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a source to a web page and I need to extract the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply