Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8043753
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 5, 20262026-06-05T05:00:38+00:00 2026-06-05T05:00:38+00:00

I am trying to speed up my calculation times by using Parallel.For . I

  • 0

I am trying to speed up my calculation times by using Parallel.For. I have an Intel Core i7 Q840 CPU with 8 cores, but I only manage to get a performance ratio of 4 compared to a sequential for loop. Is this as good as it can get with Parallel.For, or can the method call be fine-tuned to increase performance?

Here is my test code, sequential:

var loops = 200;
var perloop = 10000000;

var sum = 0.0;
for (var k = 0; k < loops; ++k)
{
    var sumk = 0.0;
    for (var i = 0; i < perloop; ++i) sumk += (1.0 / i) * i;
    sum += sumk;
}

and parallel:

sum = 0.0;
Parallel.For(0, loops,
                k =>
                    {
                        var sumk = 0.0;
                        for (var i = 0; i < perloop; ++i) sumk += (1.0 / i) * i;
                        sum += sumk;
                    });

The loop that I am parallelizing involves computation with a “globally” defined variable, sum, but this should only amount to a tiny, tiny fraction of the total time within the parallelized loop.

In Release build (“optimize code” flag set) the sequential for loop takes 33.7 s on my computer, whereas the Parallel.For loop takes 8.4 s, a performance ratio of only 4.0.

In the Task Manager, I can see that the CPU utilization is 10-11% during the sequential calculation, whereas it is only 70% during the parallel calculation. I have tried to explicitly set

ParallelOptions.MaxDegreesOfParallelism = Environment.ProcessorCount

but to no avail. It is not clear to me why not all CPU power is assigned to the parallel calculation?

Sequential vs. parallel CPU utilization

I have noticed that a similar question has been raised on SO before, with an even more disappointing result. However, that question also involved inferior parallelization in a third-party library. My primary concern is parallelization of basic operations in the core libraries.

UPDATE

It was pointed out to me in some of the comments that the CPU I am using only has 4 physical cores, which is visible to the system as 8 cores if hyper threading is enabled. For the sake of it, I disabled hyper-threading and re-benchmarked.

With hyper-threading disabled, my calculations are now faster, both the parallel and also the (what I thought was) sequential for loop. CPU utilization during the for loop is up to approx. 45% (!!!) and 100% during the Parallel.For loop.

Computation time for the for loop 15.6 s (more than twice as fast as with hyper-threading enabled) and 6.2 s for Parallel.For (25% better than when hyper-threading is enabled). Performance ratio with Parallel.For is now only 2.5, running on 4 real cores.

So the performance ratio is still substantially lower than expected, despite hyper-threading being disabled. On the other hand it is intriguing that CPU utilization is so high during the for loop? Could there be some kind of internal parallelization going on in this loop as well?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-05T05:00:40+00:00Added an answer on June 5, 2026 at 5:00 am

    Using a global variable can introduce significant synchronization problems, even when you are not using locks. When you assign a value to the variable each core will have to get access to the same place in system memory, or wait for the other core to finish before accessing it.
    You can avoid corruption without locks by using the lighter Interlocked.Add method to add a value to the sum atomically, at the OS level, but you will still get delays due to contention.

    The proper way to do this is to update a thread local variable to create the partial sums and add all of them to a single global sum at the end. Parallel.For has an overload that does just this. MSDN even has an example using sumation at How To: Write a Parallel.For Loop that has Thread Local Variables

            int[] nums = Enumerable.Range(0, 1000000).ToArray();
            long total = 0;
    
            // Use type parameter to make subtotal a long, not an int
            Parallel.For<long>(0, nums.Length, () => 0, (j, loop, subtotal) =>
            {
                subtotal += nums[j];
                return subtotal;
            },
                (x) => Interlocked.Add(ref total, x)
            );
    

    Each thread updates its own subtotal value and updates the global total using Interlocked.Add when it finishes.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am trying to speed up an often used query. Using a CompiledQuery seemed
I'm trying to speed up a large RSpec project's tests. In addition to using
I am trying to speed up my MySQL tables by using indices. When I
I'm trying to speed up SELECT queries in MySQL tables which already have some
I am trying to speed up a long running query that I have (takes
I was trying to speed up multiple FTP downloads by using threaded FTP connections.
I'm trying to speed up the start times for my Direct3D app a bit,
I'm trying to speed test jetty (to compare it with using apache) for serving
I am trying to speed up my method using SSE (On Visual Studio). I
I'm trying to speed up the following code, where given a list of strings

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.