My program contains a for() loop that processes some raw image data, line by line, which I want to parallelize using OpenMP like this:
...
#if defined(_OPENMP)
int const threads = 8;
omp_set_num_threads( threads );
omp_set_dynamic( threads );
#endif
int line = 0;
#pragma omp parallel private( line )
{
// tell the compiler to parallelize the next for() loop using static
// scheduling (i.e. balance workload evenly among threads),
// while letting each thread process exactly one line in a single run
#pragma omp for schedule( static, 1 )
for( line = 0 ; line < max; ++line ) {
// some processing-heavy code in need of a buffer
}
} // end of parallel section
....
The question is this:
Is it possible to provide an individual (preallocated) buffer (pointer) to each thread of the team executing my loop using a standard OpenMP pragma/function (thus eliminating the need to allocate a fresh buffer with each loop)?
Thanks in advance.
Bjoern
I may be understanding you wrong, but I think this should do it:
If you really meant you want to share the same buffer for different parallell blocks, you’ll have to resort to thread-local storage. (Boost as well as C++11 have facilities for making that easier to do (more portably too) than directly using TlsAlloc and friends).
Note that this approach replaces some of the thread-safety checking burden back on the programmer because it is perfectly possible to have different
omp parallelsections running at the same time, especially when they are being nested.Consider that parallel blocks could be nesting at runtime, even though they are not lexically nested. In practice that is usually not good style – and often results in poor performance. However, it is something you need to be aware of when doing this).