I’m running an image filter on GPU and I need to measure the time each part of the program takes for comparison. First I tried time.h library but it always returned zero. Then I read this post
and used the same code in my program before and after calling the kernel but still it is returning zero. Can anyone tell me what the problem could be?
This is my code:
cudaEvent_t start,stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
float Elapsed=0,Cycle;
while(count)
{
cudaEventRecord(start,0);
ImgFilter<<<dimGrid,dimBlock>>>...
cudaEventRecord(stop,0);
cudaElapsedTime(&Cycle,statr,stop);
Elapsed += Cycle;
}
printf("Time = %f",Elapsed);
I also tried printing ‘Cycle’ but it’s always zero.
You miss to call
cudaEventSynchronizefunctionNote, that device function returns before all CUDA threads finished execution and you need to use
cudaThreadSynchronizeafter kernel calling.