I have a very weird problem with GC in Java. I am running th following piece of code:
while(some condition){
//do a lot of work...
logger.info("Generating resulting time series...");
Collection<MetricTimeSeries> allSeries = manager.getTimeSeries();
logger.info(String.format("Generated %,d time series! Storing in files now...", allSeries.size()));
//for (MetricTimeSeries series : allSeries) {
// just empty loop
//}
}
When I look into JConsole, at the restart of every loop iteration, my old gen heap space, if I manually force GC, takes a size of about 90 MB. If I uncomment the loop, like this
while(some condition){
//do a lot of work...
logger.info("Generating resulting time series...");
Collection<MetricTimeSeries> allSeries = manager.getTimeSeries();
logger.info(String.format("Generated %,d time series! Storing in files now...", allSeries.size()));
for (MetricTimeSeries series : allSeries) {
// just empty loop
}
}
Even if I force it to refresh, it won’t fall below 550MB. According to yourKit profiler, the TimeSeries objects are accessible via main thread’s local var (the collection), just after the GC at the restart of a new iteration… And the collection is huge (250K time series.)… Wyy is this happening and how can I “fight” this (incorrect?) behaviour?
Since you’re building a (large)
ArrayListof time series, it will occupy the heap as long as it’s referenced, and will get promoted to old if it stays long enough (or if the young generation is too small to actually hold it). I’m not sure how you’re associating the information you’re seeing in JConsole or Yourkit to a specific point in the program, but until the empty loop is optimized by several JIT passes, yourwhileloop will take longer and keep the collection longer, which might explain the perceived difference while there’s actually not a lot.There’s nothing incorrect about that behaviour. If you don’t want to consume so much memory, you need to change your
Collectionso it’s not an eagerly-filledArrayList, but a lazy collection, more of a stream (if you’ve ever done XML processing, think DOM vs SAX) which gets evaluated as it’s iterated. If you don’t need the whole collection to be sorted, that’s doable, especially since you seem to be saying that the collection is a concatenation of sub-collections returned by underlying objects.If you can change your return type from
CollectiontoIterable, you could for example use Guava‘sFluentIterable.transformAndConcat()to transform the collection of underlying objects to a lazily-evaluatedIterableconcatenation of their time series. Of course, the size of the collection is not directly available anymore (and if you try to get it independently of the iteration, you’ll evaluate the lazy collection twice).