I would like to implement a parallel version of the code below using threads in OpenMP,is there any better way to do this?
/* Program to compute Pi using Monte Carlo methods */
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <string.h>
#include <time.h>
#define SEED 35791246
int main(int argc, char* argv)
{
int niter=0;
double x,y;
int i,count=0; /* # of points in the 1st quadrant of unit circle */
double z;
double pi;
clock_t end_time, start_time;
printf("Enter the number of iterations used to estimate pi: ");
scanf("%d",&niter);
start_time = clock();
/* initialize random numbers */
srand(SEED);
count=0;
#pragma omp parallel for
for ( i=0; i<niter; i++) {
x = (double)rand()/RAND_MAX;
y = (double)rand()/RAND_MAX;
z = x*x+y*y;
if (z<=1) count++;
}
#pragma omp task
pi=(double)count/niter*4;
#pragma omp barrier
end_time = clock();
printf("# of trials= %d , estimate of pi is %g, time= %f \n",niter,pi, difftime(end_time, start_time));
return 0;
}
It could be improved by correcting some OpenMP bugs. First, since you’re summing up (copies of)
countin all of the parallel threads, you need to apply a reduction operator at the end of the parallel segment to combine all of those back into a single value. Also, the variablesi,x,y, andzneed to have individual instances for each parallel thread — you don’t want the threads using the same one! To specify all of that, your#pragmadirective at the top of the loop should be:Also, the scope of that is the
forloop, so you don’t need to do anything else; there will automatically be a synchronization of the threads after the loop exits. (And you need that synchronization to getcountto contain all the increments from all threads!) In particular, yourtaskandbarrierpragmas are meaningless, as at that point you are back to just one thread — and, besides, there’s no point in putting that single computation in a parallel task.And there’s the issue that gabe raised about the likely slowness and/or poor randomness of the system random number generator in these cases. You will probably want to investigate the particulars of that on your system, and give it a new random seed in each thread or use a different random-number generator depending on what you find.
Besides that, it looks fairly reasonable. Not much else you can do to that algorithm, as it’s short and trivially parallelizable.