I have built a little crawler and now when trying it out i found

Question

0

Asked: June 12, 20262026-06-12T16:02:35+00:00 2026-06-12T16:02:35+00:00

I have built a little crawler and now when trying it out i found

0

I have built a little crawler and now when trying it out i found that when crawling certain sites my crawler uses 98-99% CPU.

I used dotTrace to see what the problem could be and it pointed me towards my httpwebrequest method – i optimised it a bit with the help of some previous questions here on stackoverflow.. but the problem was still there.

I then went to see what URLs that were causing the CPU load and found that it was actually sites that are extremely large in size – go figure 🙂
So, now i am 99% certain it has to do with the following piece of code:

HtmlAgilityPack.HtmlDocument documentt = new HtmlAgilityPack.HtmlDocument();
HtmlAgilityPack.HtmlNodeCollection list;
HtmlAgilityPack.HtmlNodeCollection frameList;

documentt.LoadHtml(_html);
list = documentt.DocumentNode.SelectNodes(".//a[@href]");

All that i want to do is to extract the links on the page, so for large sites.. is there anyway i can get this to not use so much CPU?

I was thinking maybe limit what i fetch? What would be my best option here?

Certainly someone must have run into this problem before 🙂

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T16:02:36+00:00

Editorial Team

2026-06-12T16:02:36+00:00Added an answer on June 12, 2026 at 4:02 pm

“.//a[@href]” is extremely slow XPath. Tried to replace with “//a[@href]” or with code that simply walks whole document and checks all A nodes.

Why this XPath is slow:

“.” starting with a node
“//” select all descendent nodes
“a” – pick only “a” nodes
“@href” with href.

Portion 1+2 ends up with “for every node select all its descendant nodes” which is very slow.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have built a little crawler and now when trying it out i found

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply