In .NET/C#, I want to validate some html code. For instance I have the following HTML :
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head><title></title></head>
<body>
CDATA section number 1?
</body>
</html>
I have the following C# code:
string htmlCode = ... // for instance the html above
var settings = new XmlReaderSettings { ValidationType = ValidationType.DTD };
settings.ValidationEventHandler += delegate(object s, ValidationEventArgs e)
{
throw new XmlException(e.Message);
};
using (var srdr = new StringReader(htmlCode))
using (var xrdr = new XmlTextReader(srdr))
using (var vrdr = XmlReader.Create(xrdr, settings))
{
try
{
while (vrdr.Read()) { }
}
catch (XmlException ex)
{
// do some stuff
}
}
when I run this code I have this exception:
System.Net.WebException : The remote server returned an error: (403) Forbidden.
at System.Net.HttpWebRequest.GetResponse()
What’s wrong in what I’ve done? Thanks in advance for your help
It’s not your code.
http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic
You need to supply the DTD yourself, for instance by using a custom
XmlResolverwhich returns the DTD from a local resource.