So I was reading Peter Norvig’s IAQ (infrequently asked questions – link) and stumbled upon this:
You might be surprised to find that an
Object takes 16 bytes, or 4 words, in
the Sun JDK VM. This breaks down as
follows: There is a two-word header,
where one word is a pointer to the
object’s class, and the other points
to the instance variables. Even though
Object has no instance variables, Java
still allocates one word for the
variables. Finally, there is a
“handle”, which is another pointer to
the two-word header. Sun says that
this extra level of indirection makes
garbage collection simpler. (There
have been high performance Lisp and
Smalltalk garbage collectors that do
not use the extra level for at least
15 years. I have heard but have not
confirmed that the Microsoft JVM does
not have the extra level of
indirection.)An empty new String() takes 40 bytes,
or 10 words: 3 words of pointer
overhead, 3 words for the instance
variables (the start index, end index,
and character array), and 4 words for
the empty char array. Creating a
substring of an existing string takes
“only” 6 words, because the char array
is shared. Putting an Integer key and
Integer value into a Hashtable takes
64 bytes (in addition to the four
bytes that were pre-allocated in the
Hashtable array): I’ll let you work
out why.
So well I obviously tried, but I can’t figure it out. In the following I only count words:
A Hashtable put creates one Hashtable$Entry: 3 (overhead) + 4 variables (3 references which I assume are 1 word + 1 int). I further assume that he means that the Integers are newly allocated (so not cached by the Integer class or already exist) which comes to 2* (3 [overhead] + 1 [1 int value]).
So in the end we end up with.. 15 words or 60bytes. So what I first thought was that the Entry as a inner class needs a reference to its outer object, but alas it’s static so that doesn’t make much sense (sure we have to store a pointer to the parent class, but I’d think that information is stored in the class header by the VM).
Just idle curiosity and I’m well aware that all this depends to a good bit on the actual JVM implementation (and on a 64bit version the results would be different), but still I don’t like questions I can’t answer 🙂
Edit: Just to make this a bit clearer: While I’m well aware that more compact structures can get us some performance benefits, I agree that in general worrying about a few bytes here or there is a waste of time. I surely wouldn’t stop using a Hashtable just because of a few bytes overhead here or there just like I wouldn’t use plain char arrays instead of Strings (or start using C). This is purely of academic interest to learn a bit more about the insides of Java/the JVM 🙂
The author appears to assume there is 3 Objects with 16 bytes overhead each and 2 32-bit references in the Map.Entry and 2 x 1 32-bit
intvalues. This would total 64-bytesThis is flawed in that Sun/Oracle’s JVM only allocates on 8-byte boundaries so that while technically an Integer occupies 20 bytes of memory, 24 bytes is used (the next multiple of 8)
Additionally many JVMs now use 64-bit references so the Map.Entry would use another 16 bytes.
This is all very inefficient, which is why you might use a class like TIntIntHashMap instead which use primitives.
However, usually it doesn’t matter as memory is surprising cheap when you compare it to the cost of your time. If you work on server applications and you cost your company about $40/hour, you need to be saving about 10 MB every minute to save as much memory as you are costing. (Ideally you need to be saving much more than this) Saving 10 MB each and every minute is hard.
Memory is reusable, but your time isn’t.