An app (Spring, JPA Hibernate, Sybase 12, Webapp) when run locally on startup consumes 40MB of the 256MB heap space based on VisualVM. When I trigger a search that returns 70,000+ rows (text data no blobs) the heap space graph shoots up to 256MB and throws out of memory. I have resolved this by using setMaxResults(limit). However, when I queried the same data, copy-pasted to a text file and saved to the filesystem, I can see that the size is only 26MB worth of text.
So in effect, 216MB(from 256-40) is consumed by loading a 26MB amount of text from the databases, who is consuming the 190MB by the time out of memory occurs? Perhaps it would be the frameworks, but I don’t see how it can consume more than the actual data being loaded…
**note again that I resolved this with the setMaxResults(limit), my question is NOT what to do but rather why, for educational purposes.
Some things to consider:
Your operating system probably uses an 8bit per character encoding to store the text file. Java strings internally are all encoded at 16 bits per character, double the space right there.
Numbers with only a few digits will be smaller encoded as text than numbers. e.g., ‘1’ is a one byte character in your text file, but a long with the value 1 is eight times that size in memory.
There will be duplication from hibernate taking values out of the SQL result set and mapping it onto your java objects. It may need to wrap/translate the contents of the result set into the types you defined on your mapping.
If your data-per-entity is actually small with a large number of entities, then the ratio of object overhead size to data size will obviously be high.
If you have small pieces of data in collections, the size of the collection can add up quick relative to the data. In extreme example, if you have LinkedList of one or two character strings, that’s 192bits consumed just by pointers for every 16-32 bits of actual data. In an array list it would still be 64 bits for the pointer to point to 16-32 bits of data. (assuming 64 bit OS of course.)
Every object you load in hibernate is “tracked” for dirty checking in what’s called the L1 cache. There can indeed quite a bit of overhead to the internal data structures and instrumentation used to do this relative to data size for large numbers of entities with small amounts of data.
—
So the 26MB of data is already 52MB of data in memory in java, assuming it is all strings, no numbers, no dates, it will be bigger otherwise.
And then if it’s split into many small pieces, 700,000 small strings rather than 1,000 really long ones, it is totally reasonable for the size of data structure overhead to be triple the size of the actual data, pushing you over 200MB easily.