I have a piece of C++ CUDA code which I have to write declaring the data variable in float. I also have to rewrite the code declaring the data variable in double.
What is a good design to handle a situation like this in CUDA?
I do not want to have two sets of same code because then in the future for any change I will have to have to change two sets of otherwise identical code. I also want to keep the code clean without too many #ifdef to change between float and double within the code.
Can anyone please suggest any good (in terms of maintenance and “easy to read”) design?
CUDA supports type templating, and it is without doubt the most efficient way to implement kernel code where you need to handle multiple types in the same code.
As a trivial example, consider a simple BLAS AXPY type kernel:
This templated kernel can be instantiated for both double and single precision without loss of generality:
The thrust template library, which ships with all recent versions of the CUDA toolkit, makes extensive use of this facility for implementing type agnostic algorithms.