I’m new with OpenCL and have some problems with the array additions
I use the code provided in the link below
and I added some parts to measure the performance of the GPU
clFinish(commandQueue);
// Queue the kernel up for execution across the array
cl_ulong start, end; cl_event k_events;
errNum = clEnqueueNDRangeKernel(commandQueue, kernel, 1, NULL,
globalWorkSize, localWorkSize,
0, NULL, &k_events);
clGetEventProfilingInfo(k_events, CL_PROFILING_COMMAND_START,
sizeof(cl_ulong), &start, NULL);
clWaitForEvents(1 , &k_events);
clGetEventProfilingInfo(k_events, CL_PROFILING_COMMAND_END,
sizeof(cl_ulong), &end, NULL);
clGetEventProfilingInfo(k_events, CL_PROFILING_COMMAND_START,
sizeof(cl_ulong), &start, NULL);
float GPUTime = (end - start);
And this to measure the CPU time
LARGE_INTEGER CPUstart, finish, freq;
QueryPerformanceFrequency(&freq);
QueryPerformanceCounter(&CPUstart);
for (int i=0;i<ARRAY_SIZE;i++){
result[i]=a[i]+b[i];
}
QueryPerformanceCounter(&finish);
double timeCPU=(finish.QuadPart - CPUstart.QuadPart) /((double)freq.QuadPart)/1000000000.0) ;
The first problem I encountered is the array size ; it can’t go beyond 10000 ; if I do this ; it just crash . How to fix it ?
The second problem is the performance ; the GPU/CPU ratio range is too wide ; from 13% to 210%(ish) . Why does this happen and can you suggest a fix ?
Edit : I figured out the 2nd ; the lag was caused by the power saving mode ; it set the core/mem to much lower than default . Just use a program to lock it ; and the performance are rocking stable at ~150-300 % (GPU/CPU)
Good case
GPU time :632667 nanosecs.
CPU time : 990023 nanosecs.
GPU/CPU ratio : 156.484 percent.
And bad one :
GPU time :6.83267e+006 nanosecs.
CPU time : 1.00756e+006 nanosecs.
GPU/CPU ratio : 14.7462 percent.
Any ideas will be appreciated . Thank you 😀
PS : The CPU is core i3-370M ; GPU : HD5470 . I use VS2008 on windows 7
One possible (and most probable) reason that your program crashes with bigger array sizes is due to the following code in
main.cpp(lines 274-276 in the original code):These are automatic arrays and space for them is allocated on the stack of the
mainfunction. The total space required is3*ARRAY_SIZE*sizeof(float)which equals12*ARRAY_SIZE. The default stack size on Windows is 1 MiB which meansARRAY_SIZEcould be up to 87380. This is the upper limit given the default stack size and since the stack is also used for other things too, the real value would be somewhat lower.You can increase the stack size on the Linker -> System page of your VS project properties. Or better allocate those arrays on the heap using
malloc()ornew[].