I am writing a very very long CUDA kernel, and it is pretty awful for human readability. Is there any way to organize CUDA kernels with functions for example outside of the kernel?
Example:
__global__ void CUDA_Kernel(int* a, int* b){
//calling function 1
//calling function 2
//calculation function
.......
}
A function can be called from inside a kernel if it is defined using the __device__ keyword.
For example: