Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8659257
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T15:58:20+00:00 2026-06-12T15:58:20+00:00

I have written this test application: it goes through iterations from 0 to 9999,

  • 0

I have written this test application: it goes through iterations from 0 to 9999, for each integer in the range it calculates some useless but calculation-intensive function. As a result the program outputs the sum of function values. To make it run on several threads I’m using InterlockedIncrement – if after increment the iteration number is <10000 then a thread processes this iteration, otherwise it terminates.

I am wondering why it is not scaling as well as I would like it to. With 5 threads it runs 8s versus 36s with a single thread. This gives ~4.5 scalability. During my experiments with OpenMP (on slightly different problems) I was getting much better scalability.

The source code is shown below.

I am running Windows7 OS on a Phenom II X6 desktop. Don’t know what other parameters might be relevant.

Could you please help me explain this sub-optimal scalability?
Many thanks.

#include <boost/thread.hpp>
#include <boost/shared_ptr.hpp>
#include <boost/make_shared.hpp>
#include <vector>
#include <windows.h>
#include <iostream>
#include <cmath>

using namespace std;
using namespace boost;

struct sThreadData
{
  sThreadData() : iterCount(0), value( 0.0 ) {}
  unsigned iterCount;
  double value;
};

volatile LONG g_globalCounter;
const LONG g_maxIter = 10000;

void ThreadProc( shared_ptr<sThreadData> data )
{
  double threadValue = 0.0;
  unsigned threadCount = 0;

  while( true )
  {
    LONG iterIndex = InterlockedIncrement( &g_globalCounter );
    if( iterIndex >= g_maxIter )
      break;

    ++threadCount;

    double value = iterIndex * 0.12345777;
    for( unsigned i = 0; i < 100000; ++i )
      value = sqrt( value * log(1.0 + value) );

    threadValue += value;
  }

  data->value = threadValue;
  data->iterCount = threadCount;
}

int main()
{
  const unsigned threadCount = 1;

  vector< shared_ptr<sThreadData> > threadData;
  for( unsigned i = 0; i < threadCount; ++i )
    threadData.push_back( make_shared<sThreadData>() );

  g_globalCounter = 0;

  DWORD t1 = GetTickCount();
  vector< shared_ptr<thread> > threads;
  for( unsigned i = 0; i < threadCount; ++i )
    threads.push_back( make_shared<thread>( &ThreadProc, threadData[i] ) );

  double sum = 0.0;
  for( unsigned i = 0; i < threadData.size(); ++i )
  {
    threads[i]->join();
    sum += threadData[i]->value;
  }

  DWORD t2 = GetTickCount();
  cout << "T=" << static_cast<double>(t2 - t1) / 1000.0 << "s\n";

  cout << "Sum= " << sum << "\n";
  for( unsigned i = 0; i < threadData.size(); ++i )
    cout << threadData[i]->iterCount << "\n";

  return 0;
}

Edit:
Attaching sample output of this test program (1 and 5 threads):
enter image description here

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T15:58:21+00:00Added an answer on June 12, 2026 at 3:58 pm

    It turned out the the results can be explained by the fact that my CPU supports the AMD Turbo Core technology.

    While in Turbo CORE mode, the AMD Phenom™ II X6 1090T shifts frequency
    speed from 3.2GHz on six cores, to 3.6GHz on three cores

    So the clock frequencies were not the same in single-threaded mode and multi-threaded mode. I was used to playing aroung with multithreading on CPUs that don’t support TurboCore. Below is an image that shows results of

    • AMD OverDrive utility window (a thing that allows to toggle TurboCore on/off)
    • a run with 1 threads with TurboCore ON
    • a run with 1 threads with TurboCore OFF
    • a run with 5 threads
      enter image description here

    Many thanks to people who tried to help.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

To test this problem I have written a minimal windows application. If I force
I have written a C DLL and some C# code to test including this
This is a simple script I have written to test command line argument handling:
I have a haskell file test.hs. In this file I have written a function
I have a test suite DLL written in C# that uses Selenium.This is then
I have written this code inside a servlet to delete certain records from three
I have written this function to auto correct gender to M or F from
I am have written a test console application for retrieving the data and sending
I have written a test where i specify my application context location with annotations.
I have written a simple WPF application in which I wanted to test the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.