I was just reading the book Clean Code and came across this statement:
When Java was young Doug Lea wrote the seminal book[8] Concurrent
Programming in Java. Along with the book he developed several
thread-safe collection, which later became part of the JDK in the
java.util.concurrentpackage. The collections in that package are safe
for multithreaded situations and they perform well. In fact, the
ConcurrentHashMapimplementation performs better than HashMap in
nearly all situations. It also allows for simultaneous concurrent
reads and writes, and it has methods supporting common composite
operations that are otherwise not thread safe. If Java 5 is the
deployment environment, start with
ConcurrentHashMap
Note that in the above quote I used “[n]”, where n is some number, to indicate the places where the author provided references, and as you can see he did not provide any reference for the bold part.
Not that I don’t believe this statement, but I would love to know the supporting evidences of this statement. So, does anyone know any resources that shows the performance statistics for both ConcurrentHashMap and HashMap? Or can anyone explain to me why ConcurrentHashMap is faster than HashMap?
I probably will look into ConcurrentHashMap’s implementation at work when I’m taking a break, but for now I would like to hear the answers from fellow SOers.
Doug Lea is extremely good at these things, so I won’t be surprised if at one time his
ConcurrentHashMapperforms better than Joshua Bloch’sHashMap. However as of Java 7, the first @author ofHashMaphas become Doug Lea too. Obviously now there’s no reasonHashMapwould be any slower than its concurrent cousin.Out of curiosity, I did some benchmark anyway. I run it under Java 7. The more entries there are, the closer the performance is. Eventually
ConcurrentHashMapis within 3% ofHashMap, which is quite remarkable. The bottleneck is really memory access, as the saying goes, "memory is the new disk (and disk is the new tape)". If the entries are in the cache, both will be fast; if the entries don’t fit in cache, both will be slow. In real applications, a map doesn’t have to be big to compete with others for residing in cache. If a map is used often, it’s cached; if not, it’s not cached, and that is the real determining factor, not the implementations (given both are implemented by the same expert)