This is the function: private List<string> getLinks(HtmlAgilityPack.HtmlDocument document) { List<string> mainLinks = new List<string>();

Question

0

Asked: June 11, 20262026-06-11T14:42:12+00:00 2026-06-11T14:42:12+00:00

This is the function: private List<string> getLinks(HtmlAgilityPack.HtmlDocument document) { List<string> mainLinks = new List<string>();

0

This is the function:

private List<string> getLinks(HtmlAgilityPack.HtmlDocument document)
        {

                List<string> mainLinks = new List<string>();
                var linkNodes = document.DocumentNode.SelectNodes("//a[@href]");
                if (linkNodes != null)
                {
                    foreach (HtmlNode link in linkNodes)
                    {
                        var href = link.Attributes["href"].Value;
                        if (href.StartsWith("http://") == true || href.StartsWith("https://") == true || href.StartsWith("www") == true) // filter for http 
                        {
                            mainLinks.Add(href);
                        }
                    }
                }

                return mainLinks;

        }

Sometimes the variable document is nuul in case the site have timeout not responde or the link is not in the right format lets say for the example the link is: wdfsfdgfsdg

So in the function test im doing:

private List<string> test(string url, int levels,DoWorkEventArgs eve)
        {

            levels = levelsTo;
            HtmlWeb hw = new HtmlWeb();
            List<string> webSites;
            try
            {
                this.Invoke(new MethodInvoker(delegate { Texts(richTextBox1, "Loading The Url:   " + url + "..." , Color.Red); }));
                HtmlAgilityPack.HtmlDocument doc =  to.GetHtmlDoc(url, reqOptions, null);
                if (timeOut == true)
                {
                    this.Invoke(new MethodInvoker(delegate { Texts(richTextBox1, " There Was A TimeOut" + Environment.NewLine , Color.Red); }));
                    timeOut = false;

                }
                else
                {
                    this.Invoke(new MethodInvoker(delegate { Texts(richTextBox1, " Done " + Environment.NewLine, Color.Red); }));
                }
                webSites = getLinks(doc);

So lets say the url is wdfsfdgfsdg then webSites is calling/using getLinks but since the url is wrong the variable doc is null so either here in the test function or in the getLinks function i need to handle this case. What i want to do is that it will tell the user that there was a timeout but also to continue the process to next the url. In the test function im calling the test function again and again like crawling and each time the variable url contai na different url.

This is the line im doing the crawling:

csFiles.AddRange(test(t, levels - 1, eve));

csFiles is a local List

So each time url contain another link and then trying to get the links of this website.
But since doc is null and its going to the function getLinks so in getLinks on the line:

var linkNodes = document.DocumentNode.SelectNodes("//a[@href]");

Im getting null exception and the program stop. The null is since document is null.

So how can i handle this case and make the program to continue to the next link ? And not to stop since its null and there is an exception.

If its i will update the question and add the full test function.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T14:42:14+00:00

Well, it should be as simple as checking for null.

var linkNodes;

if(document != null)
{
    linkNodes= document.DocumentNode.SelectNodes("//a[@href]");
    // other things if document is not null
}
else
{
    // handle null case
}

If you’re dependent on information from document, then you can either try to get the information from somewhere else, or abort the operation. there’s not much more you can do

You can also use try / catch

try
{
    //some code here
}
catch(Exception ex)
{
    //log exception, display error to user, or handle exception some way
}
finally
{
    optional block.  clean up resources
}

I would depending on try/catch if you can avoid having an exception altogether by simply checking for null.

try/catch is better for unexpected exceptions that you have to handle, or for exceptions that you don’t have control over.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

This is the function: private List<string> getLinks(HtmlAgilityPack.HtmlDocument document) { List<string> mainLinks = new List<string>();

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply