See the image below of an Nvidia Nsight 2.2 profiling session (Win7, MSVC++ 10 Pro, CUDA 4.2, GTX 670).

On the first host thread (26.8%) I get the function call names from the CUDA API. Is it possible to get the function call names from the user defined functions being executed by the second thread (13.6%) in the host process? If so, how?
Paul, this is not supported by default.
Using the NVIDIA Tools Extension (NVTX) library you can manually instrument your code. This library is installed in the directory C:\Program Files\NVIDIA GPU Computing Toolkit\nvToolsExt with Nsight Visual Studio Edition (all versions) or CUDA Toolkit 5.0 RC. NVTX is supported by Visual Profiler in 5.0 RC.
The library comes with two samples to show you how to use the library. The NvtxMultithread.cpp sample provides a helper library. The functions of interest are
or if you are using C++ the helper library has the scoped helper that can be used as
at the top of each function.
It is possible to automate this by using the cl.exe options /Gh and /GH but this requires writing assembly.