Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8693299
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T00:31:11+00:00 2026-06-13T00:31:11+00:00

Consider the code below, taken from here . For this code, I get the

  • 0

Consider the code below, taken from here. For this code, I get the following execution times:

time ./fibomp 40
Number of threads (OpenMP v200805): 2
finonacci(40) = 102334155

real    0m3.193s
user    0m3.180s
sys     0m0.000s

$ export OMP_NUM_THREADS=1
$ time ./fibomp 40
Number of threads (OpenMP v200805): 1
finonacci(40) = 102334155

real    0m3.224s
user    0m3.216s
sys     0m0.000s

So as you can see, there’s not much speed up, definitely not the 2x speedup Ruud mentions in his Tue Nov 01, 2011 1:41 am email. I’m running this on a dual core machine (could that be it?). What am I doing wrong? (BTW, BONUS points, what is the ptime command? Some SPARC Unix command?)

long comp_fib_numbers(int n)
{
  long fnm1, fnm2, fn;
  if ( n == 0 || n == 1 ) return(n);

  // In case the sequence gets too short, execute the serial version
  if ( n < 20 )
  {
     return(comp_fib_numbers(n-1)+comp_fib_numbers(n-2));
  }
  else
  {
     #pragma omp task shared(fnm1)
       fnm1 = comp_fib_numbers(n-1);
     #pragma omp task shared(fnm2)
       fnm2 = comp_fib_numbers(n-2);
     #pragma omp taskwait
       fn = fnm1 + fnm2;
       return(fn);
   }

}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T00:31:11+00:00Added an answer on June 13, 2026 at 12:31 am

    First of all, just to be sure, since you state that htop shows that a single core is being used, make sure that you have enabled OpenMP support in your compiler. The option to do so is -fopenmp for GCC, -xopenmp for Sun/Oracle compilers and -openmp for Intel compilers.

    Second of all, n = 20 might be too low of a cut off for the parallel implementation. A shameless plug – see this course material from a workshop on OpenMP that a colleague of mine gave some months ago. Several parallel versions with tasking are discussed there, starting from slide 20.

    Third, ptime is a Solaris command, not specific to SPARC since it is also available in the x86 version. Many process related Solaris commands have the p prefix in their names. Note that in your case time is more likely to be the built-in implementation that Bash provides rather than the standalone binary.

    Fourth, and may be the real answer to your question – you are missing a parallel region in your code so the task directives don’t work at all 🙂 You should rewrite your code as following:

    long comp_fib_numbers(int n)
    {
       long fnm1, fnm2, fn;
       if ( n == 0 || n == 1 ) return(n);
    
       // In case the sequence gets too short, execute the serial version
       if ( n < 20 )
       {
          return(comp_fib_numbers(n-1)+comp_fib_numbers(n-2));
       }
       else
       {
          #pragma omp parallel  // <--- You are missing this one parallel region
          {
             #pragma omp single
             {
                #pragma omp task shared(fnm1)
                fnm1 = comp_fib_numbers(n-1);
                #pragma omp task shared(fnm2)
                fnm2 = comp_fib_numbers(n-2);
             }
             #pragma omp taskwait
          }
    
          fn = fnm1 + fnm2;
          return(fn);
       }
    
    }
    

    You could make the code even more terse by using the if clause to switch of the parallel region:

    long comp_fib_numbers(int n)
    {
       long fnm1, fnm2, fn;
       if ( n == 0 || n == 1 ) return(n);
    
       #pragma omp parallel if(n >= 20)
       {
          #pragma omp single
          {
             #pragma omp task shared(fnm1)
             fnm1 = comp_fib_numbers(n-1);
             #pragma omp task shared(fnm2)
             fnm2 = comp_fib_numbers(n-2);
          }
          #pragma omp taskwait
       }
    
       fn = fnm1 + fnm2;
       return(fn);
    }
    

    If n happens to be less than 20, then the parallel region would execute single threaded. Since parallel regions are usually extracted in separate functions, there would still be an additional function call, unless the compiler choses to produce duplicate code. That’s why it is recommended that the serial implementation is extracted in its own function:

    long comp_fib_numbers_serial(int n)
    {
       if ( n == 0 || n == 1 ) return(n);
    
       return (comp_fib_numbers_serial(n-1) + comp_fib_numbers_serial(n-2));
    }
    
    long comp_fib_numbers(int n)
    {
       long fnm1, fnm2, fn;
       if ( n < 20 ) return comp_fib_numbers_serial(n);
    
       #pragma omp parallel
       {
          #pragma omp single
          {
             #pragma omp task shared(fnm1)
             fnm1 = comp_fib_numbers(n-1);
             #pragma omp task shared(fnm2)
             fnm2 = comp_fib_numbers(n-2);
          }
          #pragma omp taskwait
       }
    
       fn = fnm1 + fnm2;
       return(fn);
    }
    

    Edit: Now that I’ve looked at the code that you have linked to, I can see that the call to comp_fib_numbers is embedded into a parallel region. So just disregard my comment about the missing parallel region if you already have one in your code. I will leave it here just for completeness. Try tweaking the value at which the switch between the parallel and the serial version occurs. On modern processors it might be quite high and the example that you have seen is quite old. Also make sure that no dynamic teams are used by either setting the environment variable OMP_DYNAMIC to false (or to FALSE) or by calling omp_set_dynamic(0); someplace before the parallel region.

    You haven’t stated what your compiler is but mind that OpenMP 3.0 is supported by GCC since version 4.4, by Intel compilers since version 11.0, by Sun/Oracle compilers since version I_dont_know and is not supported at all by the Visual C/C++ compilers.

    Observed speedup on an quad-socket Intel Xeon X7350 system (old pre-Nehalem system with FSB)

    $ time OMP_NUM_THREADS=1 ./fib.x 40
    finonacci(40) = 102334155
    OMP_NUM_THREADS=1 ./fib.x 40  1.86s user 0.00s system 99% cpu 1.866 total
    $ time OMP_NUM_THREADS=2 ./fib.x 40
    finonacci(40) = 102334155
    OMP_NUM_THREADS=2 ./fib.x 40  1.96s user 0.00s system 169% cpu 1.161 total
    

    With the cut-off set to 25 (seems to be the optimal value for the X7350):

    $ time OMP_NUM_THREADS=2 ./fib.x 40
    finonacci(40) = 102334155
    OMP_NUM_THREADS=2 ./fib.x 40  1.95s user 0.00s system 169% cpu 1.153 total
    

    With the cut-off set to 25 and a separate function for the serial implementation:

    $ time OMP_NUM_THREADS=2 ./fib.x 40
    finonacci(40) = 102334155
    OMP_NUM_THREADS=2 ./fib.x 40  1.52s user 0.00s system 171% cpu 0.889 total
    

    See how the user time decreases by some 400 ms. This is because of the removed overhead.

    These were measured with the code from the site that you have linked to. Used compiler is GCC 4.4.6 on a 64-bit Scientific Linux 6.2 system.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Consider the JavaScript code below, inspired from the YUI documentation on YAHOO.lang.extend . In
I am having trouble finding an answer to this. Consider the clipping code below:
Consider this code below: <a href="javascript:void(-1)" id="a1">a1</a> <a href="javascript:void(-1)" id="a2">a2</a> <script type="text/javascript"> var buttons
Consider the code below. In essence, we get 2 strings, then we add these
Consider the code below: StockcheckJobs = (from job in (from stockcheckItem in MDC.StockcheckItems where
Consider the following code: time_t t; t = time( NULL ); elog << timezone:
Consider this F# code to sum the numbers below i that are multiples of
Consider the code below $('.item_select').each(List); function List() { var x= this.siblings('.fooClass'); } When I
Consider the code below: this.usedIds = 0; this.SendData = function(data) { var id =
Consider the code below, private void Convert_Click(Object sender, RoutedEventArgs e) { string[] strCmdLineParams =

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.