I’m profiling some C# code. The method below is one of the most expensive ones. For the purpose of this question, assume that micro-optimization is the right thing to do. Is there an approach to improve performance of this method?
Changing the input parameter to p to ulong[] would create a macro inefficiency.
static ulong Fetch64(byte[] p, int ofs = 0)
{
unchecked
{
ulong result = p[0 + ofs] +
((ulong) p[1 + ofs] << 8) +
((ulong) p[2 + ofs] << 16) +
((ulong) p[3 + ofs] << 24) +
((ulong) p[4 + ofs] << 32) +
((ulong) p[5 + ofs] << 40) +
((ulong) p[6 + ofs] << 48) +
((ulong) p[7 + ofs] << 56);
return result;
}
}
Why not use BitConverter? I’ve got to believe the Microsoft has spent some time tuning that code. Plus it deals with endian issues.
Here’s how BitConverter turns a byte[] into a long/ulong (ulong converts it as signed and then casts it to unsigned):
I suspect that doing the conversion one 32-bit word at a time is for 32-bit efficiency. No 64-bit registers on a 32-bit CPU means dealing with a 64-bit ints is a lot more expensive.
If you know for sure you’re targeting 64-bit hardware, it might be faster to do do the conversion in one fell swoop.