Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9006349
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T01:23:15+00:00 2026-06-16T01:23:15+00:00

When I was porting some fortran code to c, it surprised me that the

  • 0

When I was porting some fortran code to c, it surprised me that the most of the execution time discrepancy between the fortran program compiled with ifort (intel fortran compiler) and the c program compiled with gcc, comes from the evaluations of trigonometric functions (sin, cos). It surprised me because I used to believe what this answer explains, that functions like sine and cosine are implemented in microcode inside microprocessors.

In order to spot the problem more explicitly I made a small test program in fortran

program ftest
  implicit none
  real(8) :: x
  integer :: i
  x = 0d0
  do i = 1, 10000000
    x = cos (2d0 * x)
  end do
  write (*,*) x
end program ftest

On intel Q6600 processor and 3.6.9-1-ARCH x86_64 Linux
I get with ifort version 12.1.0

$ ifort -o ftest ftest.f90 
$ time ./ftest
  -0.211417093282753     

real    0m0.280s
user    0m0.273s
sys     0m0.003s

while with gcc version 4.7.2 I get

$ gfortran -o ftest ftest.f90 
$ time ./ftest
  0.16184945593939115     

real    0m2.148s
user    0m2.090s
sys     0m0.003s

This is almost a factor of 10 difference! Can I still believe that the gcc implementation of cos is a wrapper around the microprocessor implementation in a similar way as this is probably done in the intel implementation? If this is true, where is the bottle neck?

EDIT

According to comments, enabled optimizations should improve the performance. My opinion was that optimizations do not affect the library functions … which does not mean that I don’t use them in nontrivial programs. However, here are two additional benchmarks (now on my home computer intel core2)

$ gfortran -o ftest ftest.f90
$ time ./ftest
  0.16184945593939115     

real    0m2.993s
user    0m2.986s
sys     0m0.000s

and

$ gfortran -Ofast -march=native -o ftest ftest.f90
$ time ./ftest
  0.16184945593939115     

real    0m2.967s
user    0m2.960s
sys     0m0.003s

Which particular optimizations did you (commentators) have in mind? And how can compiler exploit a multi-core processor in this particular example, where each iteration depends on the result of the previous one?

EDIT 2

The benchmark tests of Daniel Fisher and Ilmari Karonen made me think that the problem might be related to the particular version of gcc (4.7.2) and maybe to a particular build of it (Arch x86_64 Linux) that I am using on my computers. So I repeated the test on the intel core i7 box with debian x86_64 Linux, gcc version 4.4.5 and ifort version 12.1.0

$ gfortran -O3 -o ftest ftest.f90
$ time ./ftest
  0.16184945593939115     

real    0m0.272s
user    0m0.268s
sys     0m0.004s

and

$ ifort -O3 -o ftest ftest.f90
$ time ./ftest
  -0.211417093282753     

real    0m0.178s
user    0m0.176s
sys     0m0.004s

For me this is a very much acceptable performance difference, which would never make me ask this question. It seems that I will have to ask on Arch Linux forums about this issue.

However, the explanation of the whole story is still very welcome.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T01:23:16+00:00Added an answer on June 16, 2026 at 1:23 am

    Most of this is due to differences in the math library. Some points to consider:

    • Yes, the x86 processors with the x87 unit has fsin and fcos instructions. However, they are implemented in microcode, and there is not particular reason why they must be faster than a pure software implementation.
    • GCC does not have it’s own math library, but rather uses the system provided one. On Linux this is typically provided by glibc.
    • 32-bit x86 glibc uses fsin/fcos.
    • x86_64 glibc uses software implementations using the SSE2 unit. For a long time, this was a lot slower than the 32-bit glibc version which just used the x87 instructions. However, improvements have (somewhat recently) been made, so depending on which glibc version you have the situation might not be as bad anymore as it used to be.
    • The Intel compiler suite is blessed with a VERY fast math library (libimf). Additionally, it includes vectorized transcendental math functions, which can often further speed up loops with these functions.
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm looking at porting some code that uses java.util.concurrent.ConcurrentSkipListSet to an environment where this
I'm working on porting some PHP code to C, that contacts a web API.
I'm porting some old code that uses the int enum pattern to Enum and
I am porting some of my C++ code that was highly specialized to a
I'm porting some code to Windows and am stumped. There is some code that
I'm porting some code from MATLAB to C++ and discovered that MATLAB's sin() and
I'm porting some Python code that uses raw TCP sockets to ZeroMQ for better
I'm porting some open source code so that it can build on another c++
I'm porting some PHP to C++. Some of our database code stores time values
I'm porting some code to Haskell that extensively uses the concept of a peeking

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.