Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7194213
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T20:19:34+00:00 2026-05-28T20:19:34+00:00

I am using gcc’s implementation of openmp to try to parallelize a program. Basically

  • 0

I am using gcc’s implementation of openmp to try to parallelize a program. Basically the assignment is to add omp pragmas to obtain speedup on a program that finds amicable numbers.

The original serial program was given(shown below except for the 3 lines I added with comments at the end). We have to parallize first just the outer loop, then just the inner loop. The outer loop was easy and I get close to ideal speedup for a given number of processors. For the inner loop, I get much worse performance than the original serial program. Basically what I am trying to do is a reduction on the sum variable.

Looking at the cpu usage, I am only using ~30% per core. What could be causing this? Is the program continually making new threads everytime it hits the omp parallel for clause? Is there just so much more overhead in doing a barrier for the reduction? Or could it be memory access issue(eg cache thrashing)? From what I read with most implementations of openmp threads get reused overtime(eg pooled), so I am not so sure the first problem is what is wrong.

#include<stdio.h>
#include<stdlib.h>
#include<math.h>
#include <omp.h>
#define numThread 2
int main(int argc, char* argv[]) {
    int ser[29], end, i, j, a, limit, als;
    als = atoi(argv[1]);
    limit = atoi(argv[2]);
    for (i = 2; i < limit; i++) {
        ser[0] = i;
        for (a = 1; a <= als; a++) {
            ser[a] = 1;
            int prev = ser[a-1];
            if ((prev > i) || (a == 1)) {
                end = sqrt(prev);
                int sum = 0;//added this
                #pragma omp parallel for reduction(+:sum) num_threads(numThread)//added this
                for (j = 2; j <= end; j++) {
                    if (prev % j == 0) {
                        sum += j;
                        sum += prev / j;
                    }
                }
                ser[a] = sum + 1;//added this
            }
        }
        if (ser[als] == i) {
            printf("%d", i);
            for (j = 1; j < als; j++) {
                printf(", %d", ser[j]);
            }
            printf("\n");
        }
    }
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T20:19:35+00:00Added an answer on May 28, 2026 at 8:19 pm

    OpenMP thread teams are instantiated on entering the parallel section. This means, indeed, that the thread creation is repeated every time the inner loop is starting.

    To enable reuse of threads, use a larger parallel section (to control the lifetime of the team) and specificly control the parallellism for the outer/inner loops, like so:

    Execution time for test.exe 1 1000000 has gone down from 43s to 22s using this fix (and the number of threads reflects the numThreads defined value + 1

    PS Perhaps stating the obvious, it would not appear that parallelizing the inner loop is a sound performance measure. But that is likely the whole point to this exercise, and I won’t critique the question for that.

    #include<stdio.h>
    #include<stdlib.h>
    #include<math.h>
    #include <omp.h>
    
    #define numThread 2
    int main(int argc, char* argv[]) {
        int ser[29], end, i, j, a, limit, als;
        als = atoi(argv[1]);
        limit = atoi(argv[2]);
    #pragma omp parallel num_threads(numThread)
        {
    #pragma omp single
            for (i = 2; i < limit; i++) {
                ser[0] = i;
                for (a = 1; a <= als; a++) {
                    ser[a] = 1;
                    int prev = ser[a-1];
                    if ((prev > i) || (a == 1)) {
                        end = sqrt(prev);
                        int sum = 0;//added this
    #pragma omp parallel for reduction(+:sum) //added this
                        for (j = 2; j <= end; j++) {
                            if (prev % j == 0) {
                                sum += j;
                                sum += prev / j;
                            }
                        }
                        ser[a] = sum + 1;//added this
                    }
                }
                if (ser[als] == i) {
                    printf("%d", i);
                    for (j = 1; j < als; j++) {
                        printf(", %d", ser[j]);
                    }
                    printf("\n");
                }
            }
        }
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am using GCC, what switches do I need to add to link with
I have written a simple C program using gcc compiler in Ubuntu enviroment. The
I'm using GCC to compile a program which adds floats, longs, ints and chars.
I am using gcc to compile a program which I need to link to
I am using gcc (Ubuntu 4.4.1-4ubuntu9) to compile a program that I'm writing, but
I have a simple C++ program compiled using gcc 4.2.4 on 32-bit Ubuntu 8.04.
I am using fork in my program on windows using gcc (cygwin). It runs
I've compiled my C program using gcc 4.4.1 using the flag -g , but
I'm using gcc -O -Wall -Wextra to try to help students find faults in
I just built my program for MacOSX using GCC i.e. (gcc main.c). Are there

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.