Garbage collection involves walking through a list of allocated objects (either all objects or objects in a particular generation) and determining which are reachable.
-
How is this list maintained? Do runtimes for GC languages keep a giant list of all objects?
-
Also, from what I understand, GC involves walking the call stack to look for object references – how does the algorithm distinguish between GC-able pointers and primitive data?
The memory management system keeps track of the size of each allocated object, just like it does in C or C++. One way this is commonly done is for the memory management system to allocate an extra
size_tbefore each allocation, that keeps track of the size of each objecct. The memory manager likewise has to keep track of the size of each free block, so that it can reuse blocks to allocate them.The garbage collector works in two phases: the mark phase, and the sweep phase. In the mark phase, the garbage collector starts walks object references in order to find objects that are still reachable. The garbage collector starts at a few basic places where the object references are stored and given names (the stack, and global storage, and static storage), and then traverses references in the objects.
In the sweep phase, the garbage collector walks the heap from bottom to top, jumping from allocation to allocation based on those
size_ts, and frees anything that isn’t marked.Some languages (like Ruby) tag all of the primitives so that they can be identified separately from the object references at runtime. Other garbage collectors are ver conservative and follow primatives as through they were object references (though some checks must be performed to make sure that the garbage collector doesn’t stick a mark in the middle of some other object). Still other languages use runtime type information to be more precise about whether they follow primatives.
Ruby’s garbage collector sometimes called “conservative” because it doesn’t check whether the space on the stack is actually being used, so it sometimes keeps dead objects alive by following ghost references on the stack. But since it always knows exactly whether the data it’s looking at is a reference or a primative, I don’t call it conservative here.