I use the Linux program time to measure the running time of my CUDA program, and it shows up something like this:
real 0m10.269s
user 0m6.520s
sys 0m5.336s
My question is: Is the GPU execution time included in the sys part or the user part?
You can’t tell – it could even be neither of them.
To time CUDA tasks you need to use the performance timers built into CUDA, see the best practices guide