So here’s the problem. I have a piece of code that when execute in only one thread, it works perfectly. But once this code is called with TBB, it froze (or I just don’t have the patience to wait it finishes!).
The code is too long, but imagine this:
class TBB_Test
{
public:
TBB_Test(void) { /* initialize the stuff */ }
void operator() (tbb::blocked_range<int> &r) const
{
for (int i = r.begin(); i != r.end(); ++i)
{
// compute very awesome stuff!
}
}
};
So, when I execute it in sequential:
TBB_Test() (tbb::blocked_range<int>(0, max_value));
it works, but once in parallel:
tbb::parallel_for(tbb::blocked_range<int>(0, max_value, grainsize), TBB_Test());
it froze instead of being faster than the sequential one.
What could cause such a thing ? Two threads trying to read or write at the same place ? In our case, writing shouldn’t not happen! And we have other situation where the same address is probably read by multiple threads and it doesn’t freeze!
Any idea?
In VStudio, at least there, when debugging, just activate so the debugger stops at all kind of exceptions… long, but the right way to do it!
So naturally, it was a memory allocation problem.
The bad solution is to use mutex where the memory is allocated. This is bad because you end up with your X processors running to the max… mostly waiting on mutex.
The final approach that we used is that each slice had its one memory allocation scheme. Then, by using a “join”, we merge the data together after it was computed. So this way the processor are running without any mutex. But this cause way more memory to be required. But as along as there’s no duplication between threads, you should be fine!
So, lesson learned!