I’m doing a article about GPU speed up in cluster environment
To do that, I’m programming in CUDA, that is basically a c++ extension.
But, as I’m a c# developer I don’t know the particularities of c++.
There is some concern about logging elapsed time? Some suggestion or blog to read.
My initial idea is make a big loop and run the program several times. 50 ~ 100, and log every elapsed time to after make some graphics of velocity.
Standard functions such as
timeoften have a very low resolution. And yes, a good way to get around this is to run your test many times and take an average. Note that the first few times may be extra-slow because of hidden start-up costs – especially when using complex resources like GPUs.For platform-specific calls, take a look at
QueryPerformanceCounteron Windows andCFAbsoluteTimeGetCurrenton OS X. (I’ve not used POSIX callclock_gettimebut that might be worth checking out.)Measuring GPU performance is tricky because GPUs are remote processing units running separate instructions – often on many parallel units. You might want to visit Nvidia’s CUDA Zone for a variety of resources and tools to help measure and optimize CUDA code. (Resources related to OpenCL are also highly relevant.)
Ultimately, you want to see how fast your results make it to the screen, right? For that reason, a call to
timemight well suffice for your needs.