I’m developing this test application in C++ and OpenCL but I just CAN’T figure out why I am getting this very weird problem which further results in segmentation fault.
Some of the code:
output = new cl_float[TestCount*TrainCount]; // output array
output_buf = new cl::Buffer(*context, CL_MEM_WRITE_ONLY, sizeof(cl_float)*TestCount*TrainCount, NULL);
// .... some more stuff
queue->enqueueReadBuffer(*output_buf, CL_TRUE, 0, TestCount*TrainCount, output);
Here, output and output_buf are pointers to their respective data.
The seg-fault occurs when I try to access any element of the output array after everything has been processed. Upon further debugging, I found that the maximum number of elements it is storing is 562 whereas it should be 2250 (TestCount=150, TrainCount=15). Moreover, the surprising thing is that I can access any element from GDB but not 562 upwards.
I have no doubt that there is no error in the code and I’m absolutely sure that all the 2250 outputs are being processed by the GPU. This was testing by atomically incrementing the first element of the output array in each thread and then outputting it via GDB.
Seems like I have ruled out a lot of possibilities but for the heck of it I still can’t figure out what’s causing this problem. There is a minuscule chance that the heap is getting filled but I top’ed and found that my application only uses like 37M of memory.
Any help would be appreciated!
UPDATE: James is right. It was because of not reading enough bytes from the memory. The backtrace of the seg-fault is as follows-
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x000000000000000c
0x0000000100002402 in main (argc=4, argv=0x7fff5fbffb88)
at TestApplication.cpp:202
202 cout << OpenCL_KNN::output[0] << " " << OpenCL_KNN::output[1] << " " << OpenCL_KNN::output[2] << " " << OpenCL_KNN::output[3];
The 4 indices being accessed are definitely defined. Nothing before this lines gives me any error. The output array isn’t altered/created anywhere other than what I mentioned previously.
Update 2: The error is resolved. It was only occurring in Mac OS X. It had something to do what the way I was accessing the output. Once I created a function in OpenCL_KNN namespace returning me the output, it worked perfectly.
I’m not familiar with the C++ wrapper for CL, but I’m guessing that
TestCount*TrainCountis a factor ofsizeof(cl_float)too few bytes to be reading back from the GPU. Your code should be:That doesn’t explain your segfault, however. Perhaps you are making the same error elsewhere?