This is the function:
private List<string> getLinks(HtmlAgilityPack.HtmlDocument document)
{
List<string> mainLinks = new List<string>();
var linkNodes = document.DocumentNode.SelectNodes("//a[@href]");
if (linkNodes != null)
{
foreach (HtmlNode link in linkNodes)
{
var href = link.Attributes["href"].Value;
if (href.StartsWith("http://") == true || href.StartsWith("https://") == true || href.StartsWith("www") == true) // filter for http
{
mainLinks.Add(href);
}
}
}
return mainLinks;
}
Sometimes the variable document is nuul in case the site have timeout not responde or the link is not in the right format lets say for the example the link is: wdfsfdgfsdg
So in the function test im doing:
private List<string> test(string url, int levels,DoWorkEventArgs eve)
{
levels = levelsTo;
HtmlWeb hw = new HtmlWeb();
List<string> webSites;
try
{
this.Invoke(new MethodInvoker(delegate { Texts(richTextBox1, "Loading The Url: " + url + "..." , Color.Red); }));
HtmlAgilityPack.HtmlDocument doc = to.GetHtmlDoc(url, reqOptions, null);
if (timeOut == true)
{
this.Invoke(new MethodInvoker(delegate { Texts(richTextBox1, " There Was A TimeOut" + Environment.NewLine , Color.Red); }));
timeOut = false;
}
else
{
this.Invoke(new MethodInvoker(delegate { Texts(richTextBox1, " Done " + Environment.NewLine, Color.Red); }));
}
webSites = getLinks(doc);
So lets say the url is wdfsfdgfsdg then webSites is calling/using getLinks but since the url is wrong the variable doc is null so either here in the test function or in the getLinks function i need to handle this case. What i want to do is that it will tell the user that there was a timeout but also to continue the process to next the url. In the test function im calling the test function again and again like crawling and each time the variable url contai na different url.
This is the line im doing the crawling:
csFiles.AddRange(test(t, levels - 1, eve));
csFiles is a local List
So each time url contain another link and then trying to get the links of this website.
But since doc is null and its going to the function getLinks so in getLinks on the line:
var linkNodes = document.DocumentNode.SelectNodes("//a[@href]");
Im getting null exception and the program stop. The null is since document is null.
So how can i handle this case and make the program to continue to the next link ? And not to stop since its null and there is an exception.
If its i will update the question and add the full test function.
Well, it should be as simple as checking for null.
If you’re dependent on information from document, then you can either try to get the information from somewhere else, or abort the operation. there’s not much more you can do
You can also use
try/catchI would depending on try/catch if you can avoid having an exception altogether by simply checking for null.
try/catch is better for unexpected exceptions that you have to handle, or for exceptions that you don’t have control over.