Assume you have a reference counted object in shared memory. The reference count represents the number of processes using the object, and processes are responsible for incrementing and decrementing the count via atomic instructions, so the reference count itself is in shared memory as well (it could be a field of the object, or the object could contain a pointer to the count, I’m open to suggestions if they assist with solving this problem). Occasionally, a process will have a bug that prevents it from decrementing the count. How do you make it as easy as possible to figure out which process is not decrementing the count?
One solution I’ve thought of is giving each process a UID (maybe their PID). Then when processes decrement, they push their UID onto a linked list stored alongside the reference count (I chose a linked list because you can atomically append to head with CAS). When you want to debug, you have a special process that looks at the linked lists of the objects still alive in shared memory, and whichever apps’ UIDs are not in the list are the ones that have yet to decrement the count.
The disadvantage to this solution is that it has O(N) memory usage where N is the number of processes. If the number of processes using the shared memory area is large, and you have a large number of objects, this quickly becomes very expensive. I suspect there might be a halfway solution where with partial fixed size information you could assist debugging by somehow being able to narrow down the list of possible processes even if you couldn’t pinpoint a single one. Or if you could just detect which process hasn’t decremented when only a single process hasn’t (i.e. unable to handle detection of 2 or more processes failing to decrement the count) that would probably still be a big help.
(There are more ‘human’ solutions to this problem, like making sure all applications use the same library to access the shared memory region, but if the shared area is treated as a binary interface and not all processes are going to be applications written by you that’s out of your control. Also, even if all apps use the same library, one app might have a bug outside the library corrupting memory in such a way that it’s prevented from decrementing the count. Yes I’m using an unsafe language like C/C++ 😉
Edit: In single process situations, you will have control, so you can use RAII (in C++).
You could do this using only a single extra integer per object.
Initialise the integer to zero. When a process increments the reference count for the object, it XORs its PID into the integer:
When a process decrements the reference count, it does the same.
If the reference count is ever left at 1, then the tracker integer will be equal to the PID of the process that incremented it but didn’t decrement it.
This works because XOR is commutative (
(A ^ B) ^ C==A ^ (B ^ C)), so if a process XORs the tracker with its own PID an even number of times, it’s the same as XORing it withPID ^ PID– that’s zero, which leaves the tracker value unaffected.You could alternatively use an unsigned value (which is defined to wrap rather than overflow) – adding the PID when incrementing the usage count and subtracting it when decrementing it.