OK, so I know you’re generally not supposed to compare two floating-point numbers for equality. However, in William Kahan’s How Futile are Mindless Assessments of Roundoff in Floating-Point Computation? he shows the following code (pseudo-code, I believe):
Real Function T(Real z) :
T := exp(z) ; ... rounded, of course.
If (T = 1) Return( T ) ; ... when |z| is very tiny.
If (T = 0) Return( T := –1/z ) ; ... when exp(z) underflows.
Return( T := ( T – 1 )/log(T) ) ; ... in all other cases.
End T .
Now, I’m interested in implementing this in C or C++, and I have two related questions:
a) if I take T to be a double, then in the comparison (T == 1) or (T == 0) would 0 and 1 get converted to double to preserve the precision of the values involved in a multi-type expression?
b) does this still count as comparing two floating-point numbers for equality?
Yes and yes.
For 32-bit ints,
doublecan represent every value precisely. When you compare a double to a 64-bit int, however, there will be potential roundoff error if the int is greater than 2^52. You can uselong double, though, which has at least 64 bits of mantissa.Of course, the best way is just to use a floating-point literal:
1.0or just1.has typedouble,1.0fis afloat, andmy_float_type(1)has whatever type it’s supposed to :v) .