Say I want a function that takes two floats (x and y), and I want to compare them using not their float representation but rather their bitwise representation as a 32-bit unsigned int. That is, a number like -495.5 has bit representation 0b11000011111001011100000000000000 or 0xC3E5C000 as a float, and I have an unsigned int with the same bit representation (corresponding to a decimal value 3286614016, which I don’t care about). Is there any easy way for me to perform an operation like <= on these floats using only the information contained in their respective unsigned int counterparts?
Say I want a function that takes two floats ( x and y ),
Share
You must do a signed compare unless you ensure that all the original values were positive. You must use an integer type that is the same size as the original floating point type. Each chip may have a different internal format, so comparing values from different chips as integers is most likely to give misleading results.
Most float formats look something like this:
sxxxmmmmsis a sign bitxxxis an exponentmmmmis the mantissaThe value represented will then be something like:
1mmm << (xxx-k)1mmmbecause there is an implied leading1bit unless the value is zero.If
xxx < kthen it will be a right shift.kis near but not equal to half the largest value that could be expressed byxxx. It is adjusted for the size of the mantissa.All to say that, disregarding
NaN, comparing floating point values as signed integers of the same size will yield meaningful results. They are designed that way so that floating point comparisons are no more costly than integer comparisons. There are compiler optimizations to turn offNaNchecks so that the comparisons are straight integer comparisons if the floating point format of the chip supports it.As an integer,
NaNis greater than infinity is greater than finite values. If you try an unsigned compare, all the negative values will be larger than the positive values, just like signed integers cast to unsigned.