Could someone please provide some suggestions on how I can decrease the following for loop’s runtime through multithreading? Suppose I also have two vectors called ‘a’ and ‘b’.
for (int j = 0; j < 8000; j++){
// Perform an operation and store in the vector 'a'
// Add 'a' to 'b' coefficient wise
}
This for loop is executed many times in my program. The two operations in the for loop above are already optimized, but they only run on one core. However, I have 16 cores available and would like to make use of them.
I’ve tried modifying the loop as follows. Instead of having the vector ‘a’, I have 16 vectors, and suppose that the i-th one is called a[i]. My for loop now looks like
for (int j = 0; j < 500; j++){
for (int i = 0; i < 16; i++){
// Perform an operation and store in the vector 'a[i]'
}
for (int i = 0; i < 16; i++){
// Add 'a[i]' to 'b' coefficient wise
}
}
I use the OpenMp on each of the for loops inside by adding ‘#pragma omp parallel for’ before each of the inner loops. All of my processors are in use but my runtime only increases significantly. Does anyone have any suggestions on how I can decrease the runtime of this loop? Thank You in Advance.
omp creates threads for your program whereever you insert pragma tag, so it’s createing threads for inner tags but the problem is 16 threads are created, each one does 1 operation and then all of them are destroyed using your method. creating and destroying threads take a lot of time so the method you used increases the overal time of your process although it uses all 16 cores. you didn’t have to create inner fors just put
#pragma omp parallel fortag before your 8000 loop it’s up to omp to seperate values between treads so what you did to create the second loop, is omp’s job. that way omp create threads only once and then process 500 numbers useing that each thread and end all of them after that (using 499 less thread creation and destruction)