I’m porting some OpenMP parallelized loops to GCD for use on iOS. I’ve encountered a construct that I’m not sure how best to model.
The OpenMP loop does some nontrivial operations on a shared state block, of which it allocates one per possible OpenMP thread, and then combines (reduces, in effect) the results after the loop. Like this (simplified):
const int max_threads = omp_get_max_threads();
state_block state[max_threads];
#pragma omp parallel for shared(state)
for(unsigned int i = 0; i < some_count; i++) {
// do some stuff
update_state(state[omp_get_thread_num()]);
}
merge_state_data(state, max_threads);
GCD doesn’t offer a way to know what the max number of possible threads are (does it?) or which one you’re currently on, so this pattern doesn’t work. The state block is nontrivial in size, and the iteration count is large, so allocating one for every iteration of the loop as a pure worst case isn’t plausible either.
It’s conceivable that I could use a custom dispatch source with DISPATCH_SOURCE_TYPE_DATA_ADD to do the state update, but there would be thousands of sources required if I atomized it like that, and that seems wrong.
Is there something I’m missing, either with GCD or generally in the design here?
Thanks.
You can use the POSIX threads API functions
pthread_setspecific()andpthread_getspecific()on iOS to set a thread-specific key, that points to a temporarystate_blockand to retrieve it later in the currently executing block. Apple’s Concurrency Programming Guide does not recommend usingpthread_getspecific()as it might return different values in different block runs but in your case this is perfectly acceptable (after all, that’s the functionality that you are seeking).As you do not have prior access to the pool, you have to assign the state blocks on demand:
pthread_getspecific()to get thestate_blockpointer in the current thread;state_blockand assign it to the value of the key usingpthread_setspecific();state_block.This might create problems with cleaning things up, for example think on how to dispose the state blocks once all tasks have been executed. You might want to use a shared table of pointers and set a unique index in the table as the thread-specific value or something similar. You might use locks to serialise the access to the index value – this is acceptable as it would only be done once per each thread in the pool.
Of course, this approach relies on GCD using a fixed thread pool to implement its concurrent queues.