There appears to be a memory leak when using the standard Java library (1.6.0_27) for evaluating XPath expressions.
See below for some code to reproduct this problem:
public class XpathTest {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
docFactory.setNamespaceAware(true);
DocumentBuilder builder = docFactory.newDocumentBuilder();
Document doc = builder.parse("test.xml");
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//Product");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
Node node = nodes.item(i);
System.out.println(node.getAttributes().getNamedItem("id"));
XPathExpression testExpr = xpath.compile("Test");
Object testResult = testExpr.evaluate(node, XPathConstants.NODE);
Node test = (Node) testResult;
System.out.println(test.getTextContent());
}
System.out.println(nodes.getLength());
}
}
An example XML file is given below:
<Products>
<Product id='ID0'>
<Test>0</Test>
</Product>
<Product id='ID1'>
<Test>1</Test>
</Product>
<Product id='ID2'>
<Test>2</Test>
</Product>
<Product id='ID3'>
<Test>3</Test>
</Product>
...
</Products>
When I run this example using the NetBeans profiler it appears that the allocations for the com.sun.org.apache.xpath.internal.objects.XObject class keeps increasing, even after garbage collection.
Am I using the XPath library in an incorrect way? Is this a bug in the Java libraries? Are there are potential workarounds?
There is no “memory leak” in this case. Memory leak are defined as instances where an application cannot reclaim memory. In this case there is no leak, as all
XObject(andXObject[]) instances can be reclaimed at some point in time.A memory profiler snapshot obtained from VisualVM yields the following observations:
XObject(andXObject[]) instances are created when theXPathExpression.evaluatemethod is invoked.XObjectinstances are reclaimed when they are no longer reachable from a GC root. In your case, the GC roots are theresultandtestResultlocal variables which are local to the stack of the main thread.Based on the above, I suppose that your application is experiencing or likely to experience a memory exhaustion as opposed to a memory leak. This is true when you have a large number of
XObject/XObject[]instances from an XPath expression evaluation, that haven’t been reclaimed by the garbage collector becauseThe only solution to the first is to retain objects around in memory for the duration that they are required. You do not seem to be violating that in your code, but your code could certainly be made more efficient – you are retaining the result of the first XPath expression, to be used by the second, when certainly it could be performed more efficiently.
//Product/Testcan be used to retrieve theTestnodes, and also obtain the parentProductNodes’ id values are shown in the following snippet (which evaluates only one XPath expression instead of two):As far as the second observation is concerned, you ought to obtain GC logs (using the
verbose:gcJVM startup flag). You could then decide to resize the young generation, if you have too many shortlived objects being created, as there is the possible likelihood that reachable objects will be moved to the tenured generation resulting in the likelihood that a major collection will be required to reclaim objects that are actually shortlived by nature. In an ideal scenario (considering your posted code), a young gen collection cycle should be done every few iterations of the for loop, as theXObjectinstances that are local to the loop, should be reclaimed as soon as the block’s local variables go out of scope.