I have a large amount of data to process with math intensive operations on

Question

0

Asked: May 31, 20262026-05-31T07:11:37+00:00 2026-05-31T07:11:37+00:00

I have a large amount of data to process with math intensive operations on

0

I have a large amount of data to process with math intensive operations on each data set. Much of it is analogous to image processing. However, since this data is read directly from a physical device, many of the pixel values can be invalid.

This makes NaN’s property of representing values that are not a number and spreading on arithmetic operations very compelling. However, it also seems to require turning off some optimizations such as gcc’s -ffast-math, plus we need to be cross platform. Our current design uses a simple struct that contains a float value and a bool indicating validity.

While it seems NaN was designed with this use in mind,
others think it is more trouble than it is worth. Does anyone have advice based on their more intimate experience with IEEE754 with performance in mind?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T07:11:38+00:00

BRIEF: For strictest portability, don’t use NaNs. Use a separate valid bit. E.g. a template like Valid. However, if you know that you will only ever run on IEEE 754-2008 machines, and not IEEE 754-1985 (see below), then you may get away with it.

For performance, it is probably faster not to use NaNs on most of the machines that you have access to. However, I have been involved with hardware design of FP on several machines that are improving NaN handling performance, so there is a trend to make NaNs faster, and, in particular, signalling NaNs should soon be faster than Valid.

DETAIL:

Not all floating point formats have NaNs. Not all systems use IEEE floating point. IBM hex floating point can still be found on some machines – actually systems, since IBM now supports IEEE FP on more recent machines.

Furthermore, IEEE Floating Point itself had compatibility issues wrt NaNs, in IEEE 754-1985. E.g, see wikipedia http://en.wikipedia.org/wiki/NaN:

The original IEEE 754 standard from 1985 (IEEE 754-1985) only
described binary floating point formats, and did not specify how the
signaled/quiet state was to be tagged. In practice, the most
significant bit of the significand determined whether a NaN is
signalling or quiet. Two different implementations, with reversed
meanings, resulted.
* most processors (including those of the Intel/AMD x86-32/x86-64 family, the Motorola 68000 family, the AIM PowerPC family, the ARM
family, and the Sun SPARC family) set the signaled/quiet bit to
non-zero if the NaN is quiet, and to zero if the NaN is signaling.
Thus, on these processors, the bit represents an ‘is_quiet’ flag.
* in NaNs generated by the PA-RISC and MIPS processors, the signaled/quiet bit is zero if the NaN is quiet, and non-zero if the
NaN is signaling. Thus, on these processors, the bit represents an
‘is_signaling’ flag.

This, if your code may run on older HP machines, or current MIPS machines (which are ubiquitous in embedded systems), you should not depend on a fixed encoding of NaN, but should have a machine dependent #ifdef for your special NaNs.

IEEE 754-2008 standardizes NaN encodings, so this is getting better. It depends on your market.

As for performance: many machines essentially trap, or otherwise take a major hiccup in performance, when performing computations involving both SNaNs (which must trap) and QNaNs (which don’t need to trap, i.e. which could be fast – and which are getting faster in some machines as we speak.)

I can say with confidence that on older machines, particularly older Intel machines, you did NOT want to use NaNs if you cared about performance. E.g. http://www.cygnus-software.com/papers/x86andinfinity.html says “The Intel Pentium 4 handles infinities, NANs, and denormals very badly. … If you write code that adds floating point numbers at the rate of one per clock cycle, and then throw infinities at it as input, the performance drops. A lot. A huge amount. … NANs are even slower. Addition with NANs takes about 930 cycles. … Denormals are a bit trickier to measure.”

Get the picture? Almost 1000x slower to use a NaN than to do a normal floating point operation? In this case it is almost guaranteed that using a template like Valid will be faster.

However, see the reference to “Pentium 4”? That’s a really old web page. For years people like me have been saying “QNaNs should be faster”, and it has slowly taken hold.

More recently (2009), Microsoft says http://connect.microsoft.com/VisualStudio/feedback/details/498934/big-performance-penalty-for-checking-for-nans-or-infinity “If you do math on arrays of double that contain large numbers of NaN’s or Infinities, there is an order of magnitude performance penalty.”

If I feel impelled, I may go and run a microbenchmark on some machines. But you should get the picture.

This should be changing because it is not that hard to make QNaNs fast. But it has always been a chicken and egg problem: hardware guys like those I work with say “Nobody uses NaNs, so we won;t make them fast”, while software guys don’t use NaNs because they are slow. Still, the tide is slowly changing.

Heck, if you are using gcc and want best performance, you turn on optimizations like “-ffinite-math-only … Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs.” Similar is true for most compilers.

By the way, you can google like I did, “NaN performance floating point” and check refs out yourself. And/or run your own microbenchmarks.

Finally, I have been assuming that you are using a template like

template<typename T> class Valid {
    ...
    bool valid;
    T value;
    ...
};

I like templates like this, because they can bring “validity tracking” not just to FP, but also to integer (Valid), etc.

But, they can have a big cost. The operations are probably not much more expensive than NaN handling on old machines, but the data density can be really poor. sizeof(Valid) may sometimes be 2*sizeof(float). This bad density may hurt performance much more than the operations involved.

By the way, you should consider template specialization, so that Valid uses NaNs if they arte available and fast, and a valid bit otherwise.

template <> class Valid<float> { 
    float value; 
    bool is_valid() { 
        return value != my_special_NaN; 
    } 
}

etc.

Anyway, you are better off having as few valid bits as possible, and packing them elsewhere, rather than Valid right close to the value. E.g.

struct Point { float x, y, z; };
Valid<Point> pt;

is better (density wise) than

struct Point_with_Valid_Coords { Valid<float> x, y, z; };

unless you are using NaNs – or some other special encoding.

And

struct Point_with_Valid_Coords { float x, y, z; bool valid_x, valid_y, valid_z };

is in between – but then you have to do all the code yourself.

BTW, I have been assuming you are using C++. If FORTRAN or Java …

BOTTOM LINE: separate valid bits is probably faster and more portable.

But NaN handling is speeding up, and one day soon will be good enough

By the way, my preference: create a Valid template. Then you can use it for all data types. Specialize it for NaNs if it helps. Although my life is making things faster, IMHO it is usually more important to make the code clean.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a large amount of data to process with math intensive operations on

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply