I’m playing around with threads in C++, in particular using them to parallelize a

Question

0

Editorial Team

Asked: June 6, 20262026-06-06T16:12:36+00:00 2026-06-06T16:12:36+00:00

I’m playing around with threads in C++, in particular using them to parallelize a

0

I’m playing around with threads in C++, in particular using them to parallelize a map operation.

Here’s the code:

#include <thread>
#include <iostream>
#include <cstdlib>
#include <vector>
#include <math.h>
#include <stdio.h>

double multByTwo(double x){
  return x*2;
}

double doJunk(double x){
  return cos(pow(sin(x*2),3));
}

template <typename T>
void map(T* data, int n, T (*ptr)(T)){
  for (int i=0; i<n; i++)
    data[i] = (*ptr)(data[i]);
}

template <typename T>
void parallelMap(T* data, int n, T (*ptr)(T)){
  int NUMCORES = 3;
  std::vector<std::thread> threads;
  for (int i=0; i<NUMCORES; i++)
    threads.push_back(std::thread(&map<T>, data + i*n/NUMCORES, n/NUMCORES, ptr));
  for (std::thread& t : threads)
    t.join();
}

int main()
{
  int n = 1000000000;
  double* nums = new double[n];
  for (int i=0; i<n; i++)
    nums[i] = i;

  std::cout<<"go"<<std::endl;

  clock_t c1 = clock();

  struct timespec start, finish;
  double elapsed;

  clock_gettime(CLOCK_MONOTONIC, &start);

  // also try with &doJunk
  //parallelMap(nums, n, &multByTwo);
  map(nums, n, &doJunk);

  std::cout << nums[342] << std::endl;

  clock_gettime(CLOCK_MONOTONIC, &finish);

  printf("CPU elapsed time is %f seconds\n", double(clock()-c1)/CLOCKS_PER_SEC);

  elapsed = (finish.tv_sec - start.tv_sec);
  elapsed += (finish.tv_nsec - start.tv_nsec) / 1000000000.0;

  printf("Actual elapsed time is %f seconds\n", elapsed);
}

With multByTwo the parallel version is actually slightly slower (1.01 seconds versus .95 real time), and with doJunk its faster (51 versus 136 real time). This implies to me that

the parallelization is working, and
there is a REALLY large overhead with declaring
new threads. Any thoughts as to why the overhead is so large, and how I can avoid it?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T16:12:38+00:00

Editorial Team

2026-06-06T16:12:38+00:00Added an answer on June 6, 2026 at 4:12 pm

Just a guess: what you’re likely seeing is that the multByTwo code is so fast that you’re achieving memory saturation. The code will never run any faster no matter how much processor power you throw at it, because it’s already going as fast as it can get the bits to and from RAM.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m playing around with threads in C++, in particular using them to parallelize a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply