I have a cuda code which performs calculation on GPU.
I am using clock(); to find out timings
My code structure is
__global__ static void sum(){
// calculates sum
}
extern "C"
int run_kernel(int array[],int nelements){
clock_t start, end;
start = clock();
//perform operation on gpu - call sum
end = clock();
double elapsed_time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("time required : %lf", elapsed_time);
}
But the time is always 0.0000
I checked printing start and end time. Start has some value but end time is always zero.
Any idea what might be the cause? Any alternatives to measure time.
Any help would be appreciated.
Thanks
There are two problems here:
clock()function has too low resolution to measure the duration of the event you are trying to timeCUDA has its own high precision timing API, and it is the recommended way to time operations which run on the GPU. The code to use it would look something like this: