I recently read the excellent article ‘The Transactional Memory / Garbage Collection Analogy‘ by Dan Grossman. One sentence really caught my attention:
In theory, garbage collection can improve performance by increasing spatial locality (due to object-relocation), but in practice we pay a moderate performance cost for software engineering benefits.
Until then, my feeling had always been very vague about it. Over and over, you see claims that GC can be more efficient, so I always kept that notion in the back of my head. After reading this, however, I started having serious doubts.
As an experiment to measure the impact on GC languages, some people took some Java programs, traced the execution, and then replaced garbage collection with explicit memory management. According to this review of the article on Lambda the ultimate, they found out that GC was always slower. Virtual memory issues made GC look even worse, since the collector regularly touches way more memory pages than the program itself at that point, and therefore causes a lot of swapping.
This is all experimental to me. Has anybody, and in particular in the context of C++, performed a comprehensive benchmark of GC performance when comparing to explicit memory management?
Particularly interesting would be to compare how various big open-source projects, for example, perform with or without GC. Has anybody heard of such results before?
EDIT: And please focus on the performance problem, not on why GC exists or why it is beneficial.
Cheers,
Carl
PS. In case you’re already pulling out the flame-thrower: I am not trying to disqualify GC, I’m just trying to get a definitive answer to the performance question.
This turns into another flamewar with a lot of ‘my gut feeling’. Some hard data for a change (papers contain details, benchmarks, graphs, etc.):
http://www.cs.umass.edu/~emery/pubs/04-17.pdf says:
‘Conclusion. The controversy over garbage collection’s performance impact has long overshadowed the software engineering benefi it provides.This paper introduces a tracing and simulation-based oracular memory manager. Using this framework, we execute a range of unaltered Java benchmarks using both garbage collection and explicit memory management. Comparing runtime, space consumption, and virtual memory footprints, we find that when space is plentiful, the runtime performance of garbage collection can be competitive with explicit memory management, and can even outperform it by up to 4%. We fi that copying garbage collection canrequire six times the physical memory as the Lea or Kingsley allocators to provide comparable performance.’
When you have enough memory, copying GC becomes faster than explicit
free()– http://www.cs.ucsb.edu/~grze/papers/gc/appel87garbage.pdfIt also depends on what language you use – Java will have to do a lot of rewriting (stack, objects, generations) on each collection and writing a multithreaded GC that doesn’t have to stop the world in JVM would be a great achievement. On the other hand, you get that almost for free in Haskell where GC time will rarely be >5%, while alloc time is almost 0. It really depends what you’re doing and in what environment.