Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6664235
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T02:35:17+00:00 2026-05-26T02:35:17+00:00

I am trying to parallelize my web parsing tool but the speed gains seem

  • 0

I am trying to parallelize my web parsing tool but the speed gains seem very minimal. I have i7-2600K (8 cores hyper-threading).

Here is some code to show you the idea. I only show Parallel.ForEach but you get the idea:

List<string> AllLinks = this.GetAllLinks();
ConcurrentDictionary<string, Topic> AllTopics = new ConcurrentDictionary<string, Topic> ( );

int count = 0;
Stopwatch sw = new Stopwatch ( );
sw.Start ( );

Parallel.ForEach ( AllLinks, currentLink =>
{
    Topic topic = this.ExtractTopicData ( currentLink );
    this.AllTopics.TryAdd ( currentLink, topic );

    ++count;

    if ( count > 50 )
    {
        Console.WriteLine ( sw.ElapsedMilliseconds );
        count = 0;
    }
} );

I get these timings:

Standard foreach loop:
24582
59234
82800
117786
140315

2 links per second


Paralel.For:

21902
31649
41168
49817
59321


5 links per second

Paralel.ForEach:
10217
20401
39056
49220
58125

5 links per second

Firstly why is the “startup” timing is much slower in Parallel.For?

Other than that the parallel loops give me 2.5x speed over the standard foreach loop. Is this normal?

Is there a setting I can set so that the parallel loops can use all the cores?

EDIT:

Here is pretty much what ExtractTopicData does:

HtmlAgilityPack.HtmlWeb web = new HtmlWeb ( );
HtmlAgilityPack.HtmlDocument doc = web.Load ( url );
IEnumerable<HtmlNode> links = doc.DocumentNode.SelectNodes ( "//*[@id=\"topicDetails\"]" );

var topic = new Topic();

foreach ( var link in links )
{
    //parse the link data
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T02:35:18+00:00Added an answer on May 26, 2026 at 2:35 am

    A brief perusal of HtmlAgilityPack.HtmlWeb confirms that it is using the synchronous WebRequest API. You are therefore placing long running tasks into the ThreadPool (via Parallel). The ThreadPool is designed for short-lived operations that yield the thread back to the pool quickly. Blocking on IO is a big no-no. Given the ThreadPool’s reluctance to start new threads (because it is not designed for this kind of usage), you’re going to be constrained by this behaviour.

    Fetch your web content asynchronously (see here and here for the correct API to use, you’ll have to investigate further yourself…) so that you are not tying up the ThreadPool with blocking tasks. You can then feed the decoded response to the HtmlAgilityPack for parsing.

    If you really want to jazz up performance, you’ll also need to consider that WebRequest is incapable of performing asynchronous DNS lookup. IMO this is a terrible flaw in the design of WebRequest.

    The BeginGetResponse method requires some synchronous setup tasks to complete (DNS resolution, proxy detection, and TCP socket connection, for example) before this method becomes asynchronous.

    It makes high performance downloading a real PITA. It’s at about this time that you might consider writing your own HTTP library so that everything can execute without blocking (and therefore starving the ThreadPool).

    As an aside, getting maximum throughput when chumming through web-pages is a tricky affair. In my experience, you get the code right and are then let down by the routing equipment it has to go through. Many domestic routers simply aren’t up to the job.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am trying to parallelize my code using openmp. I have managed to parallelize
I'm trying to a parallelize an application using multiprocessing which takes in a very
I'm trying to parallelize a ray tracer in C, but the execution time is
I have a loop that I'm trying to parallelize and in it I am
I'm trying to parallelize a convolution function in C. Here's the original function which
I'm trying to parallelize the element by element multiplication of two matrices in F#.
I have a sum that I'm trying to compute, and I'm having difficulty parallelizing
I was considering trying PLINQ to parallelize some numerical methods which need to be
I am trying to work out how to parallelize some code from data mining
I have a little sample application I was working on trying to get some

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.