PRAM models for parallel computing come in the three main flavours: EREW , CREW, CRCW.
I can understand how EREW, CREW can be implemented on a multicore machine. But how
would one go about implementing the CRCW model on a multicore CPU ? Is it even a practical model, since concurrent writes are not possible and every basic parallel programming course
goes into great details into race conditions.
Essentially this means that trying to avoid race conditions and trying to implement concurrent
writes are two opposing goals.
First up: We know that the PRAM is a theoretical, or abstract machine. There are several simplifications made so that it may be used for analyzing/designing parallel algorithms.
Next, let’s talk about the ways in which one may do ‘concurrent writes’ meaningfully.
Concurrent write memories are usually divided into subclasses, based on how they behave:
Priority based CW – Processors have a priority, and if multiple concurrent writes to the same location arrive, the write from the processor of highest priority gets committed to memory.
Arbitary CW – One processor’s write is arbitrarily chosen for commit.
Common CW – Multiple concurrent writes to the same location are committed only if the values being written are the same. i.e. all writing processors must agree on the value being written.
Reduction CW – A reduction operator is applied on the multiple values being written. e.g. a summation, where multiple concurrent writes to the same location lead to the sum of the values being written to be committed to memory.
These subclasses lead to some interesting algorithms. Some of the examples I remember from class are:
A CRCW-PRAM where the concurrent write is achieved as a summation can sum an arbitrarily large number of integers in a single timestep. There is a processor for each integer in the input array. All processors write their value to the same location. Done.
Imagine a CRCW-PRAM where the memory commits concurrent writes only if the value written by all processors is the same. Now imagine
NnumbersA[1] ... A[N], whose maximum you need to find. Here’s how you’d do it:Step 1.
N2 processors will compare each value to each other value, and write the result to a 2D array:
So in this 2D array, the column corresponding to the biggest number will be all 1’s.
Step 2:
Find the column which has only 1’s. And store the corresponding value as the max.
Finally, is it possible to implement for real?
Yes, it is possible. Designing, say, a register file, or a memory and associated logic, which has multiple write ports, and which arbitrates concurrent writes to the same address in a meaningful way (like the ways I described above) is possible. You can probably already see that based on the subclasses I mentioned. Whether or not it is practical, I cannot say. I can say that in my limited experience with computers (which involves mostly using general purpose hardware, like the Core Duo machine I’m currently sitting before), I haven’t seen one in practice.
EDIT: I did find a CRCW implementation. The wikipedia article on PRAM describes a CRCW machine which can find the max of an array in 2 clock cycles (using the same algorithm as the one above). The description is in SystemVerilog and can be implemented in an FPGA.