Could someone tell in general how is GPU’s shared memory used by the Matlab parallel computing toolbox. And can I use it explicitly to synchronize MPs units for example.
BTW. I have a GTX 580 which have 1.5GB of memory, 32 cores per MultiProcessor (16 cores per MP) and 64Kb of shared (L1) memory.
Thanks
I do not know the answer for Matlab, but if you’re willing to work in Python, then PyCUDA is your friend. You develop kernel code directly in CUDA-C, written out in long strings in Python. Then PyCUDA allows you to compile these, set up device variables, send data to and from the device, and then execute your kernel with launch configurations to control threads/block, etc. To utilize shared member, you merely declare variables with the
sharedkeyword in your CUDA-C code-as-Python-string.I wrote some code for image processing which is linked here. You can unpack it and see the way that I wrote the CUDA-C source modules as Python strings. With NumPy and SciPy, the rest of the user experience in Python is exceedingly similar to Matlab — just better. If you’re not married to doing this project in Matlab, consider switching it over to PyCUDA.