Let’s say I want to check whether two numbers a and b are equal. Because of imprecision with floating points, I know that instead of simply checking a == b, I usually want to pick some small number eps and check instead that abs(a - b) < eps.
But what do I do if I want to take into account floating point errors when checking that a > b? I’m guessing that instead of simply
if (a > b) {
...
}
I want to do something like:
if ((a > b) || abs(a - b) < eps) {
...
}
Is this correct? How do I check that a is “approximately greater than” b?
You are asking how to calculate a correct result (whether one value is greater than another value) from incorrect input (some values that have errors in them). Obviously, this is impossible in general: Incorrect input produces incorrect output. However, in some specific situations, we can salvage something. The following discusses one situation.
Let’s suppose you have calculated some
aandbthat approximate the ideal values a and b, where a and b are the results you would have if the calculations were done with exact mathematics. Also suppose that we know error bounds ea and eb such that a – ea ≤a≤ a + ea and a – eb ≤b≤ b + eb. In other words, the calculatedaandblie within some intervals around a and b, respectively. (Depending on the operations performed, it is possible that errors could causeaorbto lie in some unconnected intervals, possibly not even containing a or b. But we will suppose you have “well behaved” errors.)In that case, if
a– ea >b+ eb, then you can be certain that a > b.However, suppose you test for this condition and return
trueif it holds. Then, whenever this returnstrue, you will know that a > b. However, when it returnsfalse, you will not be sure that a > b is false. So, this test is good if you want to perform some action only when you are certain that a > b. But this causes you to miss performing the action in some cases when a > b.Suppose you do not want to miss any of those cases. Then consider the condition
a+ ea >b– eb. If a > b, then this condition must be true. So, if you test for this condition and perform the desired action when it holds, then the action will always be performed when a > b. However, the action may also be performed some times when it is not true that a > b.This shows that you have choices to make. If you have errors in your calculations, sometimes your application will do the wrong thing. You must choose:
If you can find some satisfactory compromise, then you set your condition to some intermediate level, and you test for the condition
a-b > e, for someethat lies between – ea – eb and + ea + eb, inclusive. If you cannot find a satisfactory compromise, then you need to improve the calculations ofaandbto reduce the errors, or you need to redesign your program in some way.Note: The final test in this scenario is
a-b > erather thana > b+ebecause there may be a small rounding error calculatingb+e. There may also be a rounding error calculatinga-b, but only ifaandbare not near each other, in which case the difference, even with rounding, is much larger thane(unless your error interval is atrocious). In the cases where we care about precision, whenais nearb, the calculation ofa-bis exact.