I was trying to measure the speed difference of single precision division vs double

Question

0

Editorial Team

Asked: May 26, 20262026-05-26T20:34:31+00:00 2026-05-26T20:34:31+00:00

I was trying to measure the speed difference of single precision division vs double

0

I was trying to measure the speed difference of single precision division vs double precision division in C++

Here is the simple code that I have written.

#include <iostream>
#include <time.h>

int main(int argc, char *argv[])
{

  float     f_x = 45672.0;
  float     f_y = 67783.0;
  double    d_x = 45672.0;
  double    d_y = 67783.0;

  float     f_answer;
  double    d_answer;

  clock_t   start,stop;
  int       N = 200000000 //2*10^8


 start = clock();
 for (int i = 0; i < N; ++i)
  {
    f_answer = f_x/f_y;
  }
 stop = clock();
 std::cout<<"Single Precision:"<< (stop-start)/(double)CLOCKS_PER_SEC<<"    "<<f_answer <<std::endl;


start = clock();
for (int i = 0; i < N; ++i)
  {
    d_answer = d_x/d_y;
  }
stop = clock();
std::cout<<"Double precision:" <<(stop-start)/(double)CLOCKS_PER_SEC<<"   "<< d_answer<<std::endl;

return 0;
}

When I compiled the code without optimization as g++ test.cpp I got the following output

Desktop: ./a.out
Single precision:8.06    0.673797
Double precision:12.68   0.673797

But if I compile this with g++ -O3 test.cpp then I get

Desktop: ./a.out
Single precision:0    0.673797
Double precision:0   0.673797

How did I get such a drastic performance increase? The time being shown in the second case is 0 because of the low resolution of the clock() function. Did the compiler somehow detect that each for loop iteration is independent of the previous iterations?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T20:34:32+00:00

Looking at the assembly that you get from g++ -O3 -S, it’s quite apparent the loops and all of your floating point calculations (aside from those involving the time) were optimized out of existence:

        .section        .text.startup,"ax",@progbits
        .p2align 4,,15
        .globl  main
        .type   main, @function
main:
.LFB970:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        pushq   %rbx
        .cfi_def_cfa_offset 24
        .cfi_offset 3, -24
        subq    $24, %rsp
        .cfi_def_cfa_offset 48
        call    clock
        movq    %rax, %rbx
        call    clock
        movq    %rax, %rbp
        movl    $.LC0, %esi
        movl    std::cout, %edi
        subq    %rbx, %rbp
        call    std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)

See the two calls to clock, one right after the other? And before those, only some stack maintenance instructions. Yep, those loops are completely gone.

You only use f_answer or d_answer to print out an answer that can be trivially calculated at compile time, and the compiler can see that. There’s no point in even having them. And if there’s no point in having them, there’s no point in having f_x, f_y, d_x, or d_y either. All gone.

To solve this, you need to have each iteration of the loop depend on the results from the last iteration. Here is my solution to this problem. I use the complex template to do some calculations involved in calculating the Mandlebrot set:

#include <iostream>
#include <time.h>
#include <complex>

int main(int argc, char *argv[])
{
   using ::std::complex;
   using ::std::cout;

   const complex<float> f_coord(0.1, 0.1);
   const complex<double> d_coord(0.1, 0.1);

   complex<float> f_answer(0, 0);
   complex<double> d_answer(0, 0);

   clock_t   start, stop;
   const unsigned int N = 200000000; //2*10^8

   start = clock();
   for (unsigned int i = 0; i < N; ++i)
   {
      f_answer = (f_answer * f_answer) + f_coord;
   }
   stop = clock();
   cout << "Single Precision: " << (stop-start)/(double)CLOCKS_PER_SEC
        << "    " << f_answer << '\n';


   start = clock();
   for (unsigned int i = 0; i < N; ++i)
   {
      d_answer = (d_answer * d_answer) + d_coord;
   }
   stop = clock();
   cout << "Double precision: " <<(stop-start)/(double)CLOCKS_PER_SEC
        << "   " << d_answer << '\n';

   return 0;
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I was trying to measure the speed difference of single precision division vs double

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply