We have a Nagios check that checks the heap memory state on some Tomcat instances. The command it uses to get metrics back from the VM is the following:
java -jar /usr/java/cmdline-jmxclient-0.10.3.jar - localhost:17757 java.lang:type=Memory HeapMemoryUsage
Which produces output such as:
committed: 132579328
init: 134217728
max: 401014784
used: 18831512
An alert is kicked off if the value against used is greater than 90% of the value against max. This seems flawed to me, mainly because the value of max can go down as well as up 🙂
What information should we be using to monitor correctly the consumption of heap space?
Should I be comparing max with the value of Xmx?
I can retrieve the value of Xmx using the following command:
java -jar /usr/java/cmdline-jmxclient-0.10.3.jar - localhost:17757 java.lang:type=Runtime InputArguments
Is there a better way?
From my observations, the “max” value fluctuates. Monitoring an example Java process, the used heap varies as you’d expect, but the committed and max values also size dynamically as the used heap approaches those limits (I believe the ratios are configurable).
In my case, the Xmx flag was set to 9 GiB and strangely, the committed and max values occasionally exceeded this (9.2 GiB)?
Java tends to make aggressive use of available heap space, so a used heap size occasionally hitting 100% wouldn’t bother me. Instead, I’d be more interested in the average of the last 5, 10 and 15 minutes etc. If the used heap stays above 90% for long periods, you may have a problem – checking your GC overhead would be a good indicator (and any OOME’s obviously).