I made a macro to simplify CUDA kernel calls:
#define LAUNCH LAUNCH_ASYNC
#define LAUNCH_ASYNC(kernel_name, gridsize, blocksize, ...) \
LOG("Async kernel launch: " #kernel_name); \
kernel_name <<< (gridsize), (blocksize) >>> (__VA_ARGS__);
#define LAUNCH_SYNC(kernel_name, gridsize, blocksize, ...) \
LOG("Sync kernel launch: " #kernel_name); \
kernel_name <<< (gridsize), (blocksize) >>> (__VA_ARGS__); \
cudaDeviceSynchronize(); \
// error check, etc...
Usage:
LAUNCH(my_kernel, 32, 32, param1, param2)
LAUNCH(my_kernel<int>, 32, 32, param1, param2)
This works fine; with the first define I can enable synronous calls and error checking for debugging.
However it does not work with multiple template arguments like below:
LAUNCH(my_kernel<int,float>, 32, 32, param1, param3)
The error message I get in the line where I call the macro:
error : expected a ">"
Is it possible to make this macro work with multiple template arguments?
The problem is that the preprocessor knows nothing about angle bracket nesting, so it interprets the comma between them as macro argument separator.
If the kernel-launch syntax supports parentheses around the kernel name (I can’t check now, not on a CUDA machine), you could do this: