On Visual Studio 2010, when I enable enhanced instruction sets on the following code, the execution time is actually increased.
void add(float * input1, float * input2, float * output, int size)
{
for(int iter = 0; iter < size; iter++)
{
output[iter] = input1[iter] * input2[iter];
}
}
int main()
{
const int SIZE = 10000000;
float *in1 = new float[SIZE];
float *in2 = new float[SIZE];
float *out = new float[SIZE];
for(int iter = 0; iter < SIZE; iter++)
{
in1[iter] = std::rand();
in2[iter] = std::rand();
out[iter] = std::rand();
}
clock_t start = clock();
for(int iter = 0; iter < 100; iter++)
{
add(in1, in2, out, SIZE);
}
clock_t end = clock();
double time = difftime(end,start)/(double)CLOCKS_PER_SEC;
system("PAUSE");
return 0;
}
I am consistently getting about 2.0 seconds for time variable with SSE2 enabled, but about 1.7 seconds when it is “Not Set”. I am building on Windows 7 64bit, VS 2010 professional, Release configuration, Optimize for speed.
Is there any explanation for why enabling SSE causes longer execution time?
There is an overhead in SSE code for moving values into and from the SSE registers, which may outweigh the performance benefits of SSE if you are only doing very few, simple calculations as is the case with your example.
Also note that this overhead becomes significantly larger if your data is not 16-byte aligned.