I’ve been trying to parallelize this piece of code for about two days and keep having logical errors. The program is to find the area of an integral using the sum of the very small dx and calculate each discrete value of the integral. I am trying to implement this with openmp but I actually have no experience with openmp. I would like your help please. The actual goal is to parallelize the suma variable in the threads so every thread calculates less values of the integral. The program compiles successfully but when I execute the program it returns wrong results.
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(int argc, char *argv[]){
float down = 1, up = 100, dx, suma = 0, j;
int steps, i, nthreads, tid;
long starttime, finishtime, runtime;
starttime = omp_get_wtime();
steps = atoi(argv[1]);
dx = (up - down) / steps;
nthreads = omp_get_num_threads();
tid = omp_get_thread_num();
#pragma omp parallel for private(i, j, tid) reduction(+:suma)
for(i = 0; i < steps; i++){
for(j = (steps / nthreads) * tid; j < (steps / nthreads) * (tid + 1); j += dx){
suma += ((j * j * j) + ((j + dx) * (j + dx) * (j + dx))) / 2 * dx;
}
}
printf("For %d steps the area of the integral 3 * x^2 + 1 from %f to %f is: %f\n", steps, down, up, suma);
finishtime = omp_get_wtime();
runtime = finishtime - starttime;
printf("Runtime: %ld\n", runtime);
return (0);
}
The problem lies within your for-loop. If you use the for-pragma, OpenMP does the loop splitting for you:
Even if you would convert to a parallelisation scheme where you would have to compute the indices by yourself, that would be problematic. The outer loop should be executed only nthread times.
You should also consider switching to double for increased accuracy.