I’m trying to parallelize a ray tracer in C, but the execution time is not dropping as the number of threads increase. The code I have so far is:
main2(thread function):
float **result=malloc(width * sizeof(float*));
int count=0;
for (int px=0;, px<width; ++px)
{
...
for (int py=0; py<height; ++py)
{
...
float *scaled_color=malloc(3*sizeof(float));
scaled_color[0]=...
scaled_color[1]=...
scaled_color[2]=...
result[count]=scaled_color;
count++;
...
}
}
...
return (void *) result;
main:
pthread_t threads[nthreads];
for (i=0;i<nthreads;i++)
{
pthread_create(&threads[i], NULL, main2, &i);
}
float** result_handler;
for (i=0; i<nthreads; i++)
{
pthread_join(threads[i], (void *) &result_handler);
int count=0;
for(j=0; j<width;j++)
{
for(k=0;k<height;k++)
{
float* scaled_color=result_handler[count];
count ++;
printf...
}
printf("\n");
}
}
main2 returns a float ** so that the picture can be printed in order in the main function. Anyone know why the exectution time is not dropping (e.g. it runs longer with 8 threads than with 4 threads when it’s supposed to be the other way around)?
It’s not enough to add threads, you need to actually split the task as well. Looks like you’re doing the same job in every thread, so you get n copies of the result with n threads.