It seems that when performing an & operation between two longs it takes the same amount of time as the equivalent operation inside 4 32bit ints.
For example
long1 & long2
Takes as long as
int1 & int2
int3 & int4
This is running on a 64bit OS and targeting 64bit .net.
In theory, this should be twice as fast. Has anyone encountered this previously?
EDIT
As a simplification, imagine I have two lots of 64 bits of data. I take those 64 bits and put them into a long, and perform a bitwise & on those two.
I also take those two sets of data, and put the 64 bits into two 32 bit int values and perform two &s. I expect to see the long & operation running faster than the int & operation.
I couldn’t reproduce the problem.
My test was as follows (int version shown):
For testing longs
a1anda2etc are merged, giving:Running the two programs on my laptop (i7 Q720) as a release build outside of VS (.NET 4.5) I got the following times:
int: 2238, long: 1924
Now considering there’s a huge amount of loop overhead, and that the
longversion is working with twice as much data (8mb vs 4mb), it still comes out clearly ahead. So I have no reason to believe that C# is not making full use of the processor’s 64 bit bitops.But we really shouldn’t be benching it in the first place. If there’s a concern, simply check the jited code (Debug -> Windows -> Disassembly). Ensure the compiler’s using the instructions you expect it to use, and move on.
Attempting to measure the performance of those individual instructions on your processor (and this could well be specific to your processor model) in anything other than assembler is a very bad idea – and from within a jit compiled language like C#, beyond futile. But there’s no need to anyway, as it’s all in Intel’s optimisation handbook should you need to know.
To this end, here’s the disassembly of the
a &=for thelongversion of the program on x64 (release, but inside of debugger – unsure if this affects the assembly, but it certainly affects the performance):As you can see there’s a single 64 bit and operation as expected, along with three 64 bit moves. So far so good, and exactly half the number of ops of the
intversion:I can only conclude that the problem you’re seeing is specific to something about your test suite, build options, processor… or quite possibly, that the
&isn’t the point of contention you believe it to be. HTH.