The problem:
Having a .h, I want to define real to be double if compiling for c/c++ or for cuda with computing capability >= 1.3. If compiling for cuda with computing capability < 1.3 then define real to be float.
After many hours I came to this (which does not work )
# if defined(__CUDACC__) # warning * making definitions for cuda # if defined(__CUDA_ARCH__) # warning __CUDA_ARCH__ is defined # else # warning __CUDA_ARCH__ is NOT defined # endif # if (__CUDA_ARCH__ >= 130) # define real double # warning using double in cuda # elif (__CUDA_ARCH__ >= 0) # define real float # warning using float in cuda # warning how the hell is this printed when __CUDA_ARCH__ is not defined? # else # define real # error what the hell is the value of __CUDA_ARCH__ and how can I print it # endif # else # warning * making definitions for c/c++ # define real double # warning using double for c/c++ # endif
when I compile (note the -arch flag)
nvcc -arch compute_13 -Ilibcutil testFloatDouble.cu
I get
* making definitions for cuda __CUDA_ARCH__ is defined using double in cuda * making definitions for cuda warning __CUDA_ARCH__ is NOT defined warning using float in cuda how the hell is this printed if __CUDA_ARCH__ is not defined now? Undefined symbols for architecture i386: "myKernel(float*, int)", referenced from: ....
I know that files get compiled twice by nvcc. The first one is OK (CUDACC defined and CUDA_ARCH >= 130) but what happens the second time?
CUDA_DEFINED but CUDA_ARCH undefined or with value < 130? Why ?
Thanks for your time.
It seems you might be conflating two things – how to differentiate between the host and device compilation trajectories when nvcc is processing CUDA code, and how to differentiate between CUDA and non-CUDA code. There is a subtle difference between the two.
__CUDA_ARCH__answers the first question, and__CUDACC__answers the second.Consider the following code snippet:
Here we have a templated CUDA kernel with CUDA architecture dependent instantiation, a separate stanza for host code steeered by
nvcc, and a stanza for compilation of host code not steered bynvcc. This behaves as follows:The take away points from this are:
__CUDACC__defines whethernvccis steering compilation or not__CUDA_ARCH__is always undefined when compiling host code, steered bynvccor not__CUDA_ARCH__is only defined for the device code trajectory of compilation steered bynvccThose three pieces of information are always enough to have conditional compilation for device code to different CUDA architectures, host side CUDA code, and code not compiled by
nvccat all. Thenvccdocumentation is a bit terse at times, but all of this is covered in the discussion on compilation trajectories.