I’m trying to determine where a slowdown is occurring in my GPU code. I’ve verified that the code runs correctly on its own (it doesn’t throw any errors, outputs are correct, finishes cleanly, etc). When I try to profile the code in Visual Profiler, it seems to run normally, dumping correct intermediate outputs to stdout. The GPU is being used (I’ve checked with cuda-gdb and dumping printf()s from inside my kernels). Once all the code has completed, Visual Profiler reports that viper has terminated the executable. However, no timeline is generated. Instead, the main window shows 0, 10, 20, 25 microseconds all “collapsed” on top of one another. When I tell the Visual Profiler to run all analysis options, it proceeds through the 24 runs without problems, but still no timeline is generated.
I’m using CUDA 4.2, driver version 295.41 on Ubuntu x86_64 with a GeForce 460.
When the visual profiler fails to generate a timeline it is typically because it cannot locate a component required for profiling. This component is a shared library found in /usr/local/cuda/lib64 called libcuinj.so. Is that path on your LD_LIBRARY_PATH? How are you launching the Visual Profiler? The script in /usr/local/cuda/bin/nvvp should set the path correctly for you.
The 4.2 version of the visual profiler does not do a good job of reporting errors when this shared library is not found. The upcoming 5.0 version of the visual profiler has much better error reporting in this regard.