Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6813481
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T20:34:31+00:00 2026-05-26T20:34:31+00:00

I was trying to measure the speed difference of single precision division vs double

  • 0

I was trying to measure the speed difference of single precision division vs double precision division in C++

Here is the simple code that I have written.

#include <iostream>
#include <time.h>

int main(int argc, char *argv[])
{

  float     f_x = 45672.0;
  float     f_y = 67783.0;
  double    d_x = 45672.0;
  double    d_y = 67783.0;

  float     f_answer;
  double    d_answer;

  clock_t   start,stop;
  int       N = 200000000 //2*10^8


 start = clock();
 for (int i = 0; i < N; ++i)
  {
    f_answer = f_x/f_y;
  }
 stop = clock();
 std::cout<<"Single Precision:"<< (stop-start)/(double)CLOCKS_PER_SEC<<"    "<<f_answer <<std::endl;


start = clock();
for (int i = 0; i < N; ++i)
  {
    d_answer = d_x/d_y;
  }
stop = clock();
std::cout<<"Double precision:" <<(stop-start)/(double)CLOCKS_PER_SEC<<"   "<< d_answer<<std::endl;

return 0;
}

When I compiled the code without optimization as g++ test.cpp I got the following output

Desktop: ./a.out
Single precision:8.06    0.673797
Double precision:12.68   0.673797

But if I compile this with g++ -O3 test.cpp then I get

Desktop: ./a.out
Single precision:0    0.673797
Double precision:0   0.673797

How did I get such a drastic performance increase? The time being shown in the second case is 0 because of the low resolution of the clock() function. Did the compiler somehow detect that each for loop iteration is independent of the previous iterations?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T20:34:32+00:00Added an answer on May 26, 2026 at 8:34 pm

    Looking at the assembly that you get from g++ -O3 -S, it’s quite apparent the loops and all of your floating point calculations (aside from those involving the time) were optimized out of existence:

            .section        .text.startup,"ax",@progbits
            .p2align 4,,15
            .globl  main
            .type   main, @function
    main:
    .LFB970:
            .cfi_startproc
            pushq   %rbp
            .cfi_def_cfa_offset 16
            .cfi_offset 6, -16
            pushq   %rbx
            .cfi_def_cfa_offset 24
            .cfi_offset 3, -24
            subq    $24, %rsp
            .cfi_def_cfa_offset 48
            call    clock
            movq    %rax, %rbx
            call    clock
            movq    %rax, %rbp
            movl    $.LC0, %esi
            movl    std::cout, %edi
            subq    %rbx, %rbp
            call    std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
    

    See the two calls to clock, one right after the other? And before those, only some stack maintenance instructions. Yep, those loops are completely gone.

    You only use f_answer or d_answer to print out an answer that can be trivially calculated at compile time, and the compiler can see that. There’s no point in even having them. And if there’s no point in having them, there’s no point in having f_x, f_y, d_x, or d_y either. All gone.

    To solve this, you need to have each iteration of the loop depend on the results from the last iteration. Here is my solution to this problem. I use the complex template to do some calculations involved in calculating the Mandlebrot set:

    #include <iostream>
    #include <time.h>
    #include <complex>
    
    int main(int argc, char *argv[])
    {
       using ::std::complex;
       using ::std::cout;
    
       const complex<float> f_coord(0.1, 0.1);
       const complex<double> d_coord(0.1, 0.1);
    
       complex<float> f_answer(0, 0);
       complex<double> d_answer(0, 0);
    
       clock_t   start, stop;
       const unsigned int N = 200000000; //2*10^8
    
       start = clock();
       for (unsigned int i = 0; i < N; ++i)
       {
          f_answer = (f_answer * f_answer) + f_coord;
       }
       stop = clock();
       cout << "Single Precision: " << (stop-start)/(double)CLOCKS_PER_SEC
            << "    " << f_answer << '\n';
    
    
       start = clock();
       for (unsigned int i = 0; i < N; ++i)
       {
          d_answer = (d_answer * d_answer) + d_coord;
       }
       stop = clock();
       cout << "Double precision: " <<(stop-start)/(double)CLOCKS_PER_SEC
            << "   " << d_answer << '\n';
    
       return 0;
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am trying to use EMMA to measure coverage of some JUnit tests that
I was trying to measure the speed of a TCP server I'm writing, and
I am trying to measure latency to a server that I don't control. This
I'am trying to measure the performance of a computer vision program that tries to
I'm trying to create a cube with a single measure. This measure is a
I'm trying to measure the visual size of a NSString that takes into account
I'm trying to measure the execution time of some bits of code as accurately
I'm trying to use following code with System.nanoTime() to measure the elapsed time of
I am trying to measure how long a function takes. I have a little
I'm trying to measure time between taps. I've gathered code from a couple other

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.