Why hasnt atomicAdd() for doubles been implemented explicitly as a part of CUDA 4.0

Question

0

Asked: June 12, 20262026-06-12T00:13:01+00:00 2026-06-12T00:13:01+00:00

Why hasnt atomicAdd() for doubles been implemented explicitly as a part of CUDA 4.0

0

Why hasnt atomicAdd() for doubles been implemented explicitly as a part of CUDA 4.0 or higher?

From the appendix F Page 97 of the CUDA programming guide 4.1 the following versions of
atomicAdd have been implemented.

int atomicAdd(int* address, int val);
unsigned int atomicAdd(unsigned int* address,
                       unsigned int val);
unsigned long long int atomicAdd(unsigned long long int* address,
                                 unsigned long long int val);
float atomicAdd(float* address, float val)

The same page goes on to give a small implementation of atomicAdd for doubles as follows
which I have just started using in my project.

__device__ double atomicAdd(double* address, double val)
{
    unsigned long long int* address_as_ull =
                             (unsigned long long int*)address;
    unsigned long long int old = *address_as_ull, assumed;
    do {
        assumed = old;
old = atomicCAS(address_as_ull, assumed,
                        __double_as_longlong(val +
                               __longlong_as_double(assumed)));
    } while (assumed != old);
    return __longlong_as_double(old);
}

Why not define the above code as a part of CUDA ?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T00:13:02+00:00

Edit: As of CUDA 8, double-precision atomicAdd() is implemented in CUDA with hardware support in SM_6X (Pascal) GPUs.

~~Currently, no CUDA devices support atomicAdd for double in hardware.~~ As you noted, it can be implemented in terms of atomicCAS on 64-bit integers, but there is a non-trivial performance cost for that.

Therefore, the CUDA software team chose to document a correct implementation as an option for developers, rather than make it part of the CUDA standard library. This way developers are not unknowingly opting in to a performance cost they don’t understand.

Aside: I don’t think this question should be closed as “not constructive”. I think it’s a perfectly valid question, +1.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Why hasnt atomicAdd() for doubles been implemented explicitly as a part of CUDA 4.0

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply