From Nvidia release notes: The nvcc compiler switch, –fmad (short name: -fmad), to control

Question

0

Asked: June 10, 20262026-06-10T13:48:00+00:00 2026-06-10T13:48:00+00:00

From Nvidia release notes: The nvcc compiler switch, –fmad (short name: -fmad), to control

0

From Nvidia release notes:

 The nvcc compiler switch, --fmad (short name: -fmad), to control the contraction of    
 floating-point multiplies and add/subtracts into floating-point multiply-add   
 operations (FMAD, FFMA, or DFMA) has been added: 
 --fmad=true and --fmad=false enables and disables the contraction respectively. 
 This switch is supported only when the --gpu-architecture option is set with     
 compute_20, sm_20, or higher. For other architecture classes, the contraction is     
  always enabled. 
 The --use_fast_math option implies --fmad=true, and enables the contraction.

I have two kernels – one is purely compute bound with lots of multiplications, whereas the other one is memory bound. I notice a consistent improvement in performance (around 5%) for my compute intensive kernel when I do -fmad=false…and around the same percent decline in performance when I turn it off for my memory bound kernel.
So, FMA is working better for my memory bound kernel, but my compute bound kernel could squeeze a little performance by turning it off.
What could be the reason?
My device is M2090 and I am using CUDA 4.2.

Full compilation options:
-arch,sm_20,-ftz=true,-prec-div=false,-prec-sqrt=false,-use_fast_math,-fmad=false (or I just remove fmad=false because that’s the default anyway.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T13:48:02+00:00

Use of FMA may increase register pressure slightly, because three source operands must be available at the same time. So turning FMA generation on / off can lead to small differences in instruction scheduling and register allocation, which in turn can lead to small performance differences. For a compute-bound kernel with many multiply-add idioms, -fmad=true should make a significant performance difference, but as you say, your kernel is dominated by multiplies and thus will benefit little from use of FMA, and any gains may be offset by the register pressure / instruction scheduling aspects

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

From Nvidia release notes: The nvcc compiler switch, –fmad (short name: -fmad), to control

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply