I’m implementing am algorithm on C/C++ to process some vectors and I thought it could be a good idea to make it parallel since I’m working with a multicore CPU. I have some experience with GPGPU and there bad memory access can ruin the entire performance, do I need to consider any special access layout between the cores on the CPU also?
Thanks
There are a number of memory-related problems you can run into with a multiprocessor setup, and some of them can slow an application to a crawl.
You need to be roughly aware of the cache line size on your box and attempt 2 things:
(The above two rules also apply to data pages, if you’re dealing with large data structures that must be paged.)
Where possible, set up separate working data structures (especially heap) for each thread, rather than sharing the data. Especially beware of having a common counter that all threads update, and (obviously) avoid locks and semaphores except at critical junctures where you absolutely need to synchronize threads.