I’m developing an application which loads lots of data (like from csv).
I’m creating List<List<SimpleCell>> and loading into it the readed cells.
SimpleCell class contains 5 * String, every String have on average 10 characters.
So I’m thinking that if I read 1000 rows – each containing 160 columns – that gives 1000*160=160 000 SimpleCell‘s instances – it’ll be something about 160 000 * sizeof(SimpleCell.class) =~ 160 000 * 10 * 5 = 8 000 000 bytes =~ 7.63 MB.
But when I’m looking at jconsole (and after clicking Perform GC) memory usage is something about 790MB. How could this be?
Note that I don’t store any references to any “temporary” objects.
Here is the code when the memory usage rises:
for(int i = r.getFromIndex(); i <= r.getToIndex(); ++i) {
System.out.println("Processing: 'ZZ " + i + "'");
List<SimpleCell> values = saxRead("ZT/ZZ " + i + "");
rows.add(values);
}
saxRead just creates inputStream parses it with SAX, closes stream, and returns cells (created by SAXHandler) – so there are only local variables (that I think will be garbaged in the near ‘future’).
I’m getting out of heap error when reading 1000 rows but I must read approximately 7k.
Obviously – there’s something that I don’t know about jvm memory.
So why memory usage is so huge when loading this relatively small amount of data?
A String uses 48 bytes plus the size of the text * 2. (Each character is 2 bytes) The Simple Cell object uses 40 bytes and the List of them uses 1064 bytes.
This means each row uses 1064 + 160 * 40 + 5 * 180 * (48 + 20) bytes or about 68K. If you have 1000 lines you will be using about 70 MB which is much less than what you are seeing.
I suggest you use a memory profile to see exactly how much memory is being used by what. e.g. VisualVM or YourKit.
Depending on how you construct the Strings you retain even more memory than this. For example its likely you are retaining a reference to the original XML as when you take a
substringof it, you are actually holding a copy of the original.You may find this class useful. It will reduce the amount of memory Strings use if they are using more than they need and reduce duplicates using a fixed size cache.