This is the function:
private static HtmlAgilityPack.HtmlDocument getHtmlDocumentWebClient(string url, bool useProxy, string proxyIp, int proxyPort, string usename, string password)
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
WebClient client = new WebClient();
//client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
client.Credentials = CredentialCache.DefaultCredentials;
client.Proxy = WebRequest.DefaultWebProxy;
if (useProxy)
{
//Proxy
if (!string.IsNullOrEmpty(proxyIp))
{
WebProxy p = new WebProxy(proxyIp, proxyPort);
if (!string.IsNullOrEmpty(usename))
{
if (password == null)
password = string.Empty;
NetworkCredential nc = new NetworkCredential(usename, password);
p.Credentials = nc;
}
}
}
Stream data = client.OpenRead(url);
doc.Load(data);
data.Close();
return doc;
}
Im getting links each itertion in my program and after few times the variable url is:
http://appldnld.apple.com/iTunes10/041-7196.20120912.Ber43/iTunesSetup.exe
If i mtrying this link in my InternetExplorer it will try to download the file.
But in my program its trying to Load it in the line:
doc.Load(data);
Wich make after some time the program to freeze stuck and in the end when i force to END the application in Task Manager the program throw me an exception:
StackOverFlowException was unhandled
An unhandled exception of type 'System.StackOverflowException' occurred in HtmlAgilityPack.dll
System.StackOverflowException was unhandled
Message: An unhandled exception of type 'System.StackOverflowException' occurred in HtmlAgilityPack.dll
Now i used a breakpoint and the problem happen on the line:
doc.Load(data);
The question is how should i handle in cases of this links ? Should i ignore them by try and catch or maybe i should consider this as a link ? What if sometime in the future i will want to use this links to download the exe files so maybe try and ctach is not a good idea ?
Edited:
This is how the getHtmlDocumentWebClient look like now:
private static HtmlAgilityPack.HtmlDocument getHtmlDocumentWebClient(string url, bool useProxy, string proxyIp, int proxyPort, string usename, string password)
{
HttpWebRequest myHttpWebRequest = null; //Declare an HTTP-specific implementation of the WebRequest class.
HttpWebResponse myHttpWebResponse = null; //Declare an HTTP-specific implementation of the WebResponse class
//Create Request
myHttpWebRequest = (HttpWebRequest)HttpWebRequest.Create(url);
myHttpWebRequest.Method = "GET";
myHttpWebRequest.ContentType = "text/html; encoding='utf-8'";
//Get Response
myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
Stream data = myHttpWebResponse.GetResponseStream();//client.OpenRead(url);
doc.Load(data);
data.Close();
return doc;
}
Same problem yet. Whats wrong with the function now and how do i do the actual checking/s for text/html content ?
You should check the
Content-Typebefore trying to parse the response as HTML.If it isn’t
text/htmlor one of its variants, don’t parse it.To get the Content-Type, you will need to use
HttpWebRequestinstead ofWebClient.You can then check
response.Headers.