On Linux I’m using shmget and shmat to setup a shared memory segment that one process will write to and one or more processes will read from. The data that is being shared is a few megabytes in size and when updated is completely rewritten; it’s never partially updated.
I have my shared memory segment laid out as follows:
-------------------------
| t0 | actual data | t1 |
-------------------------
where t0 and t1 are copies of the time when the writer began its update (with enough precision such that successive updates are guaranteed to have differing times). The writer first writes to t1, then copies in the data, then writes to t0. The reader on the other hand reads t0, then the data, then t1. If the reader gets the same value for t0 and t1 then it considers the data consistent and valid, if not, it tries again.
Does this procedure ensure that if the reader thinks the data is valid then it actually is?
Do I need to worry about out-of-order execution (OOE)? If so, would the reader using memcpy to get the entire shared memory segment overcome the OOE issues on the reader side? (This assumes that memcpy performs it’s copy linearly and ascending through the address space. Is that assumption valid?)
Joe Duffy gives the exact same algorithm and calls it: “A scalable reader/writer scheme with optimistic retry”.
It works.
You need two sequence number fields.
You need to read and write them in opposite order.
You might need to have memory barriers in place, depending on the memory ordering guarantees of the system.
Specifically, you need read acquire and store release semantics for the readers and writers when they access t0 or t1 for reading and writing respectively.
What instructions are needed to achieve this, depends on the architecture. E.g. on x86/x64, because of the relatively strong guarantees one needs no machine specific barriers at all in this specific case*.
* one still needs to ensure that the compiler/JIT does not mess around with loads and stores , e.g. by using
volatile(that has a different meaning in Java and C# than in ISO C/C++. Compilers may differ, however. E.g. using VC++ 2005 or above with volatile it would be safe doing the above. See the “Microsoft Specific” section. It can be done with other compilers as well on x86/x64. The assembly code emitted should be inspected and one must make sure that accesses to t0 and t1 are not eliminated or moved around by the compiler.)As a side note, if you ever need
MFENCE,lock or [TopOfStack],0might be a better option, depending on your needs.