I have an apparent memory leak in a hadoop program I’m running. Specifically I get the message:
ERROR GC overhead limit exceeded
followed later by the exception
attempt_201210041336_0765_m_0000000_1: Exception in thread "Tread for syncLogs" java.lang.OutOfMemoryError: GC overhead limit exceeded
attempt_201210041336_0765_m_0000000_1: at java.util.Vector.elements (Vector.java:292)
attempt_201210041336_0765_m_0000000_1: at org.apache.log4j.helpers.AppenderAtachableImpl.getAllAppenders(AppenderAttachableImpl.java:84
attempt_201210041336_0765_m_0000000_1: at org.apache.log4j.Category.getAllAppenders (Category.java:415)
attempt_201210041336_0765_m_0000000_1: at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:256)
attempt_201210041336_0765_m_0000000_1: at org.apache.hadoop.mapred.Child$3.run(Child.java:157)
I’m running on what should be very small data sets in an initial trial, so I shouldn’t be hitting any memory limit. More to the point I don’t want to change the hadoop configuration; if the program can’t run with the current configuration the program needs rewritten.
Can anyone help me figure out how to diagnose this issue? ise there a command line argument to get a stack trace of memory usage? any other way of tracking this issue?
ps. I wrote the error message by hand, can’t copy-paste from the system that has the issue. So please ignore any typo as being my stupid fault.
edit: update to this. I ran the job a few more times; while I always get the
Error GC overhead limit exceeded
message I don’t always get the stacktrace for log4j. So the issue is probably not log4j, instead log4j happened to fail due to the lack of memory caused by…something else?
“GC overhead limit exceeded” probably means that a lot of short-lived objects are being created, more than the GC can handle without consuming more than 98% of the total time. See this question on how to find the problematic classes and allocation spots with JProfiler.
Disclaimer: My company develops JProfiler.