I’m working on an application that does a lot of floating-point calculations. We use

Question

0

Asked: May 13, 20262026-05-13T17:44:22+00:00 2026-05-13T17:44:22+00:00

I’m working on an application that does a lot of floating-point calculations. We use

0

I’m working on an application that does a lot of floating-point calculations. We use VC++ on Intel x86 with double precision floating-point values. We make claims that our calculations are accurate to n decimal digits (right now 7, but trying to claim 15).

We go to a lot of effort of validating our results against other sources when our results change slightly (due to code refactoring, cleanup, etc.). I know that many many factors play in to the overall precision, such as the FPU control state, the compiler/optimizer, floating-point model, and the overall order of operations themselves (i.e., the algorithm itself), but given the inherent uncertainty in FP calculations (e.g., 0.1 cannot be represented), it seems invalid to claim any specific degree of precision for all calulations.

My question is this: is it valid to make any claims about the accuracy of FP calculations in general without doing any sort of analysis (such as interval analysis)? If so, what claims can be made and why?

EDIT:

So given that the input data is accurate to, say, n decimal places, can any guarantee be made about the result of any arbitrary calculations, given that double precision is being used? E.g., if the input data has 8 significant decimal digits, the output will have at least 5 significant decimal digits… ?

We are using math libraries and are unaware of any guarantees they may or may not make. The algorithms we use are not necessarily analyzed for precision in any way. But even given a specific algorithm, the implementation will affect the results (just changing the order of two addition operations, for example). Is there any inherent guarantee whatsoever when using, say, double precision?

ANOTHER EDIT:

We do empirically validate our results against other sources. So are we just getting lucky when we achieve, say, 10-digit accuracy?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T17:44:23+00:00

Unless your code uses only the basic operations specified in IEEE 754 (+, -, *, / and square root), you do not even know how much precision loss each call to library functions outside your control (trigonometric functions, exp/log, …) introduce. Functions outside the basic 5 are not guaranteed to be, and are usually not, precise at 1ULP.

You can do empirical checks, but that’s what they remain… empirical. Don’t forget the part about there being no warranty in the EULA of your software!

If your software was safety-critical, and did not call library-implemented mathematical functions, you could consider http://www-list.cea.fr/labos/gb/LSL/fluctuat/index.html . But only critical software is worth the effort and has a chance to fit in the analysis constraints of this tool.

You seem, after your edit, mostly concerned about your compiler doing things in your back. It is a natural fear to have (because like for the mathematical functions, you are not in control). But it’s rather unlikely to be the problem. Your compiler may compute with a higher precision than you asked for (80-bit extendeds when you asked for 64-bit doubles or 64-bit doubles when you asked for 32-bit floats). This is allowed by the C99 standard. In round-to-nearest, this may introduce double-rounding errors. But it’s only 1ULP you are losing, and so infrequently that you needn’t worry. This can cause puzzling behaviors, as in:

float x=1.0;
float y=7.0;
float z=x/y;
if (z == x/y) 
...
else
... /* the else branch is taken */

but you were looking for trouble when you used == between floating-point numbers.

When you have code that does cancelations on purpose, such as in Kahan’s summation algorithm:

d = (a+b)-a-b;

and the compiler optimizes that into d=0;, you have a problem. And yes, this optimization “as if floats operation were associative” has been seen in general compilers. It is not allowed by C99. But the situation has gotten better, I think. Compiler authors have become more aware of the dangers of floating-point and no longer try to optimize so aggressively. Plus, if you were doing this in your code you would not be asking this question.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m working on an application that does a lot of floating-point calculations. We use

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply