I am currently trying to create a library with CUDA routines but I am running into trouble. I will explain my problems using a rather minimal example, my actual library will be larger.
I have successfully written test.cu, a source file containing a __global__ CUDA function and a wrapper around it (to allocate and copy memory). I can also successfully compile this file into a shared library using the following commands:
nvcc -c test.cu -o test.o -lpthread -lrt -lcuda -lcudart -Xcompiler -fPIC
gcc -m64 -shared -fPIC -o libtest.so test.o -lpthread -lrt -lcuda -lcudart -L/opt/cuda/lib64
The resulting libtest.so exports all my needed symbols.
I now compile my purely C main.c and link it against my library:
gcc -std=c99 main.c -o main -lpthread -ltest -L.
This step is also successful, but upon executing ./main all CUDA functions that are called return an error:
test.cu:17:cError(): cudaGetDeviceCount: [38] no CUDA-capable device is detected
test.cu:17:cError(): cudaMalloc: [38] no CUDA-capable device is detected
test.cu:17:cError(): cudaMemcpy: [38] no CUDA-capable device is detected
test.cu:17:cError(): cudaMemcpy: [38] no CUDA-capable device is detected
test.cu:17:cError(): cudaFree: [38] no CUDA-capable device is detected
(Error messages are created through a debugging function of my own)
During my initial steps I encountered the exact same problem, as I was directly creating an executable from test.cu, because I forgot to link against libpthread (-lpthread). But, as you can see above, I have linked all source files against libpthread. According to ldd, both libtest.so and main depend on libpthread, as it should be.
I am using CUDA 5 (yes, I do realize it is a beta) with gcc 4.6.3 and nvidia driver version 302.06.03 on ArchLinux.
Some help in solving this problem would be more than appreciated!
Here’s a trivial example…
Compile/link with
nvcc -m64 -arch=sm_20 -o libtest.so --shared -Xcompiler -fPIC test.cu.Compile/link with
gcc -std=c99 -o main -L. -ltest main.c.