I was trying to measure the speed difference of single precision division vs double precision division in C++
Here is the simple code that I have written.
#include <iostream>
#include <time.h>
int main(int argc, char *argv[])
{
float f_x = 45672.0;
float f_y = 67783.0;
double d_x = 45672.0;
double d_y = 67783.0;
float f_answer;
double d_answer;
clock_t start,stop;
int N = 200000000 //2*10^8
start = clock();
for (int i = 0; i < N; ++i)
{
f_answer = f_x/f_y;
}
stop = clock();
std::cout<<"Single Precision:"<< (stop-start)/(double)CLOCKS_PER_SEC<<" "<<f_answer <<std::endl;
start = clock();
for (int i = 0; i < N; ++i)
{
d_answer = d_x/d_y;
}
stop = clock();
std::cout<<"Double precision:" <<(stop-start)/(double)CLOCKS_PER_SEC<<" "<< d_answer<<std::endl;
return 0;
}
When I compiled the code without optimization as g++ test.cpp I got the following output
Desktop: ./a.out
Single precision:8.06 0.673797
Double precision:12.68 0.673797
But if I compile this with g++ -O3 test.cpp then I get
Desktop: ./a.out
Single precision:0 0.673797
Double precision:0 0.673797
How did I get such a drastic performance increase? The time being shown in the second case is 0 because of the low resolution of the clock() function. Did the compiler somehow detect that each for loop iteration is independent of the previous iterations?
Looking at the assembly that you get from
g++ -O3 -S, it’s quite apparent the loops and all of your floating point calculations (aside from those involving the time) were optimized out of existence:See the two calls to
clock, one right after the other? And before those, only some stack maintenance instructions. Yep, those loops are completely gone.You only use
f_answerord_answerto print out an answer that can be trivially calculated at compile time, and the compiler can see that. There’s no point in even having them. And if there’s no point in having them, there’s no point in havingf_x,f_y,d_x, ord_yeither. All gone.To solve this, you need to have each iteration of the loop depend on the results from the last iteration. Here is my solution to this problem. I use the
complextemplate to do some calculations involved in calculating the Mandlebrot set: