I write simple C++ code that compute array reduction sum, but with OpenMP reduction

Question

0

Asked: May 23, 20262026-05-23T03:08:04+00:00 2026-05-23T03:08:04+00:00

I write simple C++ code that compute array reduction sum, but with OpenMP reduction

0

I write simple C++ code that compute array reduction sum, but with OpenMP reduction program works slowly. There are two variants of program: one is simplest sum, another – sum of complex math function. In code complex variant is commented.

#include <iostream>
#include <omp.h>
#include <math.h>

using namespace std;

#define N 100000000
#define NUM_THREADS 4

int main() {

  int *arr = new int[N];

  for (int i = 0; i < N; i++) {
    arr[i] = i;
  }

  omp_set_num_threads(NUM_THREADS);
  cout << NUM_THREADS << endl;

  clock_t start = clock();
  int sum = 0;
  #pragma omp parallel for reduction(+:sum)
  for (int i = 0; i < N; i++) {
    // sum += sqrt(sqrt(arr[i] * arr[i])); // complex variant
    sum += arr[i]; // simple variant
  }

  double diff = ( clock() - start ) / (double)CLOCKS_PER_SEC;
  cout << "Time " << diff << "s" << endl;

  cout << sum << endl;

  delete[] arr;

  return 0;
}

I compile it by ICPC and GCC:

icpc reduction.cpp -openmp -o reduction -O3
g++ reduction.cpp -fopenmp -o reduction -O3

Processor: Intel Core 2 Duo T5850, OS: Ubuntu 10.10

There are execution time of simple and complex variants, compiled with and without OpenMP.

Simple variant “sum += arr[i];”:

icpc
0.1s without OpenMP
0.18s with OpenMP

g++
0.11c without OpenMP
0.17c with OpenMP

Complex variant “sum += sqrt(sqrt(arr[i] * arr[i]));”:

icpc
2,92s without OpenMP
3,37s with OpenMP

g++ 
47,97s without OpenMP
48,2s with OpenMP

In system monitor I see that 2 cores works in program with OpenMP and 1 core works in program without OpenMP. I’ll try several numbers of threads in OpenMP and dont have speedup. I don’t understand why reduction is slow.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T03:08:04+00:00

Editorial Team

2026-05-23T03:08:04+00:00Added an answer on May 23, 2026 at 3:08 am

The function clock() measures processor time consumed by whole process, so printed time shows sum of time consumed by all threads. If you want to see wall-time (real time elapsed from the begin to the end), use e.g. times() function on the POSIX system

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I write simple C++ code that compute array reduction sum, but with OpenMP reduction

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply