I optimized an extension method to compare two streams for equality (byte-for-byte) – knowing that this is a hot method I tried to optimize it as far as possible (the streams can reach into multi-megabyte lengths). I essentially came up with the following approach:
[StructLayout(LayoutKind.Explicit)]
struct Converter
{
[FieldOffset(0)]
public Byte[] Byte;
[FieldOffset(0)]
public UInt64[] UInt64;
}
/// <summary>
/// Compares two streams for byte-by-byte equality.
/// </summary>
/// <param name="target">The target stream.</param>
/// <param name="compareTo">The stream to compare the target to.</param>
/// <returns>A value indicating whether the two streams are identical.</returns>
public static bool CompareBytes(this Stream target, Stream compareTo)
{
if (target == null && compareTo == null)
return true;
if (target == null || compareTo == null)
return false;
if (target.Length != compareTo.Length)
return false;
if (object.ReferenceEquals(target, compareTo))
return true;
if (!target.CanRead || !target.CanSeek)
throw new ArgumentOutOfRangeException("target");
if (!compareTo.CanRead || !compareTo.CanSeek)
throw new ArgumentOutOfRangeException("target");
lock (target)
{
lock (compareTo)
{
var origa = target.Position;
var origb = compareTo.Position;
try
{
target.Position = compareTo.Position = 0;
// Shrink the number of comparisons.
var arr1 = new byte[4096];
var convert1 = new Converter() { Byte = arr1 };
var arr2 = new byte[4096];
var convert2 = new Converter() { Byte = arr2 };
int len;
while ((len = target.Read(arr1, 0, 4096)) != 0)
{
if (compareTo.Read(arr2, 0, 4096) != len)
return false;
for (var i = 0; i < (len / 8) + 1; i++)
if (convert1.UInt64[i] != convert2.UInt64[i])
return false;
}
return true;
}
finally
{
target.Position = origa;
compareTo.Position = origb;
}
}
}
}
The problem is that the convert1.UInt64[i] != convert2.UInt64[i] if block (returning false) is being evaluated even when the values are the equal. I checked each on individually, then checked the outcome of the ‘not equals’. I am in pure disbelief:

I have not messed with the instruction pointer – this is how the code executed and the watch pin is live.
Any ideas how this could happen?
The debugger in general has a hard time with this union, it can’t display the array content when I try it. But the core problem is no doubt the +1 in the for() end expression. That indexes the array beyond its last element when len is divisible by 8. The runtime cannot catch this mistake, overlapping the arrays causes the Length property to have a bogus value. What happens next is undefined behavior, you are reading bytes that are not part of the array. A workaround is to make the array 7 bytes longer.
This kind of code is not exactly an optimization, reading and comparing uint64 on a 32-bit machine is expensive, especially when the array isn’t aligned correctly.. About 50% odds for that. A better mousetrap is to use the C runtime memcmp() function, available on any Windows machine:
And use it like this:
Do compare the perf of this with a plain for() loop that compares bytes. The ultimate throttle here is the memory bus bandwidth.