Imagine I have two byte[] arrays, b1 and b2, and they have the bytes corresponding to two doubles.
One option would be something like…
double thisValue = readDouble(b1, s1);
double thatValue = readDouble(b2, s2);
return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));
which uses…
/** Parse an integer from a byte array. */
public static int readInt(byte[] bytes, int start) {
return (((bytes[start ] & 0xff) << 24) +
((bytes[start+1] & 0xff) << 16) +
((bytes[start+2] & 0xff) << 8) +
((bytes[start+3] & 0xff)));
}
/** Parse a long from a byte array. */
public static long readLong(byte[] bytes, int start) {
return ((long)(readInt(bytes, start)) << 32) +
(readInt(bytes, start+4) & 0xFFFFFFFFL);
}
/** Parse a double from a byte array. */
public static double readDouble(byte[] bytes, int start) {
return Double.longBitsToDouble(readLong(bytes, start));
}
(code taken from apache hadoop source here and here).
The thing is, you have their byte representations, so it seems wasteful have to actually case them into a double, although maybe this is so heavily optimized as to be negligible. I am sure the Hadoop people know what they are doing, I’m just curious why it wouldn’t be better/faster to just compare the bits directly? Or maybe the compiler is smart enough to see this sort of thing and do just that.
Thanks
Due to the structure of the IEEE floating-point format, you can’t simply check if all of the bits are identical: for example, -0 and +0 have distinct representations, but are considered equal; and NaN values, which have many different representations, are never equal to anything, including other NaN values with the same representation.
While it is certainly possible to implement these checks yourself, it quickly becomes very complex, and not worth it: the “subvalues” you need to check do not have their own bytes, so you still have to extract the bytes and throw them into larger values – and then you have to actually check all of the different conditions.
In other words, you end up doing the same things that the above piece of code is doing, but you’re spending many more lines of code, and you’re very unlikely to perform any better than what’s already there.