I have a function that I eventually want to parallelize. Currently, I call things

Question

0

Editorial Team

Asked: May 26, 20262026-05-26T09:03:07+00:00 2026-05-26T09:03:07+00:00

I have a function that I eventually want to parallelize. Currently, I call things

0

I have a function that I eventually want to parallelize.

Currently, I call things in a for loop.

double temp = 0;
int y = 123;  // is a value set by other code
for(vector<double>::iterator i=data.begin(); i != data.end(); i++){
    temp += doStuff(i, y);
}

doStuff needs to know how far down the list it is. So I use i – data.begin() to calculate.

Next, I’d like to use the stl::for_each function instead. My challenge is that I need to pass the address of my iterator and the value of y. I’ve seen examples of using bind2nd to pass a parameter to the function, but how can I pass the address of the iterator as the first parameter?

The boost FOREACH functions also looks like a possibility, however I do not know if it will parallelize auto-magically like the STL version does.

Thoughts, ideas, suggestions?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T09:03:08+00:00

If you want real parallelization here, use

GCC with tree vectorization optimization on (-O3) and SIMD (e.g. -march=native to get SSE support). If the operation (dostuff) is non-trivial, you could opt to do it ahead of time (std::transform or std::for_each) and accumulate next (std::accumulate) since the accumulation will be optimized like nothing else on SSE instructions!
```
void apply_function(double& value)
{
     value *= 3; // just a sample...
}

// ...

std::vector<double> data(1000);
std::for_each(data.begin(), data.end(), &apply_function);
double sum = std::accumulate(data.begin(), data.end(), 0);
```

Note that though this will not actually run on multiple threads, the performance increase will be massive since SSE4 instructions can handle many floating operations *in parallell _on a single core_ .

If you wanted true parallelism, use one of the following

GNU Parallel Mode

Compile with g++ -fopenmp -D_GLIBCXX_PARALLEL:

__gnu_parallel::accumulate(data.begin(), data.end(), 0.0);

OpenMP directly

Compile with g++ -fopenmp

double sum = 0.0;
#pragma omp parallel for reduction (+:sum)
for (size_t i=0; i<data.end(); i++)
{
    sum += do_stuff(i, data[i]);
}

This will result in the loop being parallelized into as many threads (OMP team) as there are (logical) CPU cores on the actual machine, and the result ‘magically’ combined and synchronized.

Final remarks:

You can simulate the binary function for for_each by using a stateful function object. This is not exactly recommended practice. It will also appear to be very inefficient (when compiling without optimization, it is). This is due to the fact that function objects are passed by value thoughout the STL. However, it is reasonable to expect a compiler to completely optimize the potential overhead of that away, especially for simple cases like the following:

struct myfunctor
{
    size_t index; 
    myfunctor() : index(0) {}

    double operator()(const double& v) const
    {
        return v * 3; // again, just a sample
    }
};

// ...
std::for_each(data.begin(), data.end(), myfunctor());

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a function that I eventually want to parallelize. Currently, I call things

Leave an answerCancel reply

1 Answer

GNU Parallel Mode

OpenMP directly

Final remarks:

Leave an answer
Cancel reply