I’m implementing the producer/consumer problem for homework, and I have to compare the sequential algorithm with the parallel one, and my parallel one seems to only be able to run either at the same speed or slower than the sequential one. I’ve come to the conclusion that using a queue is a limiting factor and it won’t speed up my algorithm.
Is this the case or am I just coding it wrong?
int main() {
long sum = 0;
unsigned long serial = ::GetTickCount();
for(int i = 0; i < test; i++){
enqueue(rand()%54354);
sum+= dequeue();
}
printf("%d \n",sum);
serial = (::GetTickCount() - serial);
printf("Serial Program took: %f seconds\n", serial * .001);
sum = 0;
unsigned long omp = ::GetTickCount();
#pragma omp parallel for num_threads(128) default(shared)
for(int i = 0; i < test; i++){
enqueue(rand()%54354);
sum+= dequeue();
}
#pragma omp barrier //joins all threads
omp = (::GetTickCount() - omp);
printf("%d \n",sum);
printf("OpenMP Program took: %f seconds\n", omp * .001);
getchar();
}
Problem #1:
You have
rand()inside the parallel region.rand()is not thread-safe. It uses global/static variables. So calling it concurrently from multiple threads will lead to unexpected (possibly undefined) behavior.That aside, the data-races resulting from concurrent calls to
rand()will lead to a lot of cache coherency stalls. This is likely the source of the slowdown.Problem #2:
Is
enqueue()anddequeue()thread-safe?If it isn’t, then you need to fix that first. If it is, how are you synchronizing it?
If it’s just a critical region that allows only one thread at a time to access the queue, then that kind of defeats the whole purpose of parallelism.
Problem #3:
This line modifies the
sumvariable in each iteration:Note that all the threads will be doing this concurrently. So you need to declare
sumas a reduction variable.