I have done this for other apps but for some reason its not working in my current app.
here is a code snippet, working in VS2010, using Cuda 4.2. I have compiled the ptx file both inside VS and also outside VS without resolving the problem:
CUmodule Module = NULL;
int rc7 = cuModuleLoad(&Module, CubinName); // needs bin
if (rc7 == 0) {
rc = cuModuleGetFunction( &cuF_makeProcFrame, Module, "makeProcFrame" );
}
I am getting rc=500 — function not found.
when I open the ptx file in a text editor I see:
.entry _Z13makeProcFrame14cudaPitchedPtriiii(
.param .align 4 .b8 _Z13makeProcFrame14cudaPitchedPtriiii_param_0[16],
.param .u32 _Z13makeProcFrame14cudaPitchedPtriiii_param_1,
.param .u32 _Z13makeProcFrame14cudaPitchedPtriiii_param_2,
.param .u32 _Z13makeProcFrame14cudaPitchedPtriiii_param_3,
.param .u32 _Z13makeProcFrame14cudaPitchedPtriiii_param_4
)
and finally in the CUDA code itself here is the called line:
__global__ void makeProcFrame(
cudaPitchedPtr YProcBasePtr,
int numFrames,
int width,
int height,
int lineBytes
)
Can anyone tell me why i am getting error return rather than function found ?
edit: here is the batch file for compiling:
“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\bin\nvcc.exe”
-gencode=arch=compute_20,code=sm_20 -ccbin “C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin” -I”C:\Program Files\NVIDIA
GPU Computing Toolkit\CUDA\v4.2\include” -O -G –machine 32
–maxrregcount=0 -ptx -o=”filterKernelHand.ptx” filterKernel.cu
Use “extern “C”” when declaring CUDA kernels – this way compiler will not mangle function name.