I’ve found that != and == are not the fastest ways for testing for zero or non-zero.
bool nonZero1 = integer != 0;
xor eax, eax
test ecx, ecx
setne al
bool nonZero2 = integer < 0 || integer > 0;
test ecx, ecx
setne al
bool zero1 = integer == 0;
xor eax, eax
test ecx, ecx
sete al
bool zero2 = !(integer < 0 || integer > 0);
test ecx, ecx
sete al
Compiler: VC++ 11
Optimization flags: /O2 /GL /LTCG
This is the assembly output for x86-32. The second versions of both comparisons were ~12% faster on both x86-32 and x86-64. However, on x86-64 the instructions were identical (first versions looked exactly like the second versions), but the second versions were still faster.
- Why doesn’t the compiler generate the faster version on x86-32?
- Why are the second versions still faster on x86-64 when the assembly output is identical?
EDIT: I’ve added benchmarking code. ZERO: 1544ms, 1358ms NON_ZERO: 1544ms, 1358ms
http://pastebin.com/m7ZSUrcP
or
http://anonymouse.org/cgi-bin/anon-www.cgi/http://pastebin.com/m7ZSUrcP
Note: It’s probably inconvenient to locate these functions when compiled in a single source file, because main.asm goes quite big. I had zero1, zero2, nonZero1, nonZero2 in a separate source file.
EDIT2: Could someone with both VC++11 and VC++2010 installed run the benchmarking code and post the timings? It might indeed be a bug in VC++11.
Just compiled the sources with suitable modifications to my
ne.cfile and the/O2and/GLflags. Here’s the sourceand the corresponding assembly:
ne2()which used the<,>and||operators is clearly more expensive.ne1()andne3()which use the==and!=operators respectively, are terser and equivalent.Visual Studio 2011 is in beta. I would consider this as a bug. My tests with two other compilers namely gcc 4.6.2 and clang 3.2, with the
O2optimization switch yielded the exact same assembly for all three tests (that I had) on my Windows 7 box. Here’s a summary:yields with gcc:
and with clang:
My suggestion would be to file this as a bug with Microsoft Connect.
Note: I compiled them as C source since I don’t think using the corresponding C++ compiler would make any significant change here.