I’ve tried to understand a few examples, including questions here so I apologise if

Question

0

Asked: May 15, 20262026-05-15T00:01:26+00:00 2026-05-15T00:01:26+00:00

I’ve tried to understand a few examples, including questions here so I apologise if

0

I’ve tried to understand a few examples, including questions here so I apologise if this seems to me a duplicate but I cannot find a RegularExpression I can understand.
I have some HTML to parse using an XML parser – but I want to strip out the <head> </head> tags from this content as the rest is valid enough for normal XML Parsing.
The tags <head> to </head> must be removed and their content so that the outer HTML is not affected <body> tags etc.
This is the section including the Head HTML I want removed for reference:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" >
<html>
    <head>
    <link rel="stylesheet" type="text/css" href="/style/stylesheet.css" />
    <meta name="description" content="Information" />
    <base target="_top">
</head>
<body>
<!-- Body Here -->
</body>
</html>

I also need to strip the DocType, if this can be done using a RegEx then that would be great. The head is always the same – I want to remove from <head> to </head> inclusive only and if possible remove the DOCTYPE from the Text also.

Also this will need to work in Silverlight and use System.Text.RegularExpressions or similar to work.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T00:01:27+00:00

Editorial Team

2026-05-15T00:01:27+00:00Added an answer on May 15, 2026 at 12:01 am

Extracting the Body was easier – here is the RegEx I am using:

@"\<body\>(.*?)\</body\>"

Now I can parse that normally with LINQ-to-XML!

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve tried to understand a few examples, including questions here so I apologise if

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply