Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6112651
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T14:46:09+00:00 2026-05-23T14:46:09+00:00

Using gcc 4.6 with -O3, I have timed the following four codes using the

  • 0

Using gcc 4.6 with -O3, I have timed the following four codes using the simple time command

#include <iostream>
int main(int argc, char* argv[])
{
  double val = 1.0; 
  unsigned int numIterations = 1e7; 
  for(unsigned int ii = 0;ii < numIterations;++ii) {
    val *= 0.999;
  }
  std::cout<<val<<std::endl;
}

Case 1 runs in 0.09 seconds

#include <iostream>
int main(int argc, char* argv[])
{
  double val = 1.0;
  unsigned int numIterations = 1e8;
  for(unsigned int ii = 0;ii < numIterations;++ii) {
    val *= 0.999;
  }
  std::cout<<val<<std::endl;
}

Case 2 runs in 17.6 seconds

int main(int argc, char* argv[])
{
  double val = 1.0;
  unsigned int numIterations = 1e8;
  for(unsigned int ii = 0;ii < numIterations;++ii) {
    val *= 0.999;
  }
}

Case 3 runs in 0.8 seconds

#include <iostream>
int main(int argc, char* argv[])
{
  double val = 1.0;
  unsigned int numIterations = 1e8;
  for(unsigned int ii = 0;ii < numIterations;++ii) {
    val *= 0.999999;
  }
  std::cout<<val<<std::endl;
}

Case 4 runs in 0.8 seconds

My question is why is the second case so much slower than all the other cases? Case 3 shows that removing the cout brings the runtime back in line with what is expected. And Case 4 shows that changing the multiplier also greatly reduces the runtime. What optimization or optimizations are not occuring in case 2 and why?

Update:

When I originally ran these tests there was no separate variable numIterations, the value was hard-coded in the for loop. In general, hard-coding this value made things run slower than the cases given here. This is especially true for Case 3 which ran almost instantly with the numIterations variable as shown above, indicating James McNellis is correct about the entire loop being optimized out. I’m not sure why hard-coding the 1e8 into the for loop prevents the removal of the loop in Case 3 or makes things slower in the other cases, however, the basic premise of Case 2 being significantly slower is even more true.

Diffing the assembly output gives for the cases above gives

Case 2 and Case 1:
movl $100000000, 16(%esp)


movl $10000000, 16(%esp)

Case 2 and Case 4:
.long -652835029
.long 1072691150


.long -417264663
.long 1072693245

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T14:46:10+00:00Added an answer on May 23, 2026 at 2:46 pm

    René Richter was on the right track regarding underflow. The smallest positive normalized number is about 2.2e-308. With f(n)=0.999**n, this limit is reached after about 708,148 iterations. The remaining iterations are stuck with unnormalized computations.

    This explains why 100 million iterations take slightly more than 10 times the time needed to perform 10 million. The first 700,000 are done using the floating point hardware. Once you hit denormalized numbers, the floating point hardware punts; the multiplication is done in software.

    Note that this would not be the case if the repeated computation properly calculated 0.999**N. Eventually the product would reach zero, and from that point on the multiplications would once again be done with the floating point hardware. That is not what happens because 0.999 * (smallest denormalized number) is the smallest denormalized number. The continued product eventually bottoms out.

    What we can do is change the exponent. An exponent of 0.999999 will keep the continued product within the realm of normalized numbers for 708 million iterations. Taking advantage of this,

    Case A  : #iterations = 1e7, exponent=0.999, time=0.573692 sec
    Case B  : #iterations = 1e8, exponent=0.999, time=6.10548 sec
    Case C  : #iterations = 1e7, exponent=0.999999, time=0.018867 sec
    Case D  : #iterations = 1e8, exponent=0.999999, time=0.189375 sec
    

    Here you can easily see how much faster the floating point hardware is than is the software emulation.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm using gcc 4.3.2. I have the following code (simplified): #include <cstdlib> template<int SIZE>
I have the following code compiled by gcc: #include <iostream> using namespace std; class
I have the following program #include <stdio.h> int main(void) { unsigned short int length
I have a c codes which can be compiled in Linux using gcc. But
I have an application which i build using gcc on linux host for ARM
Greetings, SO. I have some code which I've made attempts at compiling using gcc,
gcc 4.4.2 / Visual Studio C++ 2008 I have been using cmake on linux,
GCC compiles (using gcc --omit-frame-pointer -s ): int the_answer() { return 42; } into
I am using gcc on linux. I was previously using the following code :
gcc 4.4.3 c89 I have the following source file. However, I get a stack

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.