How to extract the text of such an element via XPath:
<document>
some text
<subelement>subelement text</subelement>
postscript
</document>
The XPath expression:
/document
returns document node text and all its subnodes text:
some text subelement text postscript
While the XPath expression:
/document/text()
returns just the first text node:
some text
that is, “postscript” is missing.
Question
Is there a way to get the text of all text nodes that are immediate sons of <document>?
Postscript
Very focused Example, in case you want to test yourself, copy into a main method and fix the imports.
DocumentBuilder dbuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
String xml = "<?xml version='1.0' encoding='UTF-8'?>" +
"<document>"
+ "some text into document"
+ " <subelement>"
+ " some text into SUBelement"
+ " </subelement>"
+ "POSTSCRIPT"
+ "</document>";
//i'm forced to use an InputSource because parse doesn't take readers directly :-(
Document doc = dbuilder.parse(new InputSource(new StringReader(xml)));
//usual way to get an xpath
XPath xp = XPathFactory.newInstance().newXPath();
System.out.println(xp.evaluate("/document", doc));
System.out.println(xp.evaluate("/document/text()",doc));
The XPath expression above returns all text node children of
/document, but the XPath.evaluate() method, with no 3rd argument converts its result to a string.In the process, it apparently acts like
<xsl:value-of>in that it only converts the first node in the result node-set.To print the value of all text node children, supply
XPathConstants.NODESETas the 3rd argument to XPath.evaluate(). This will give you the nodeset of text nodes as a NodeList. Then you can loop through them and print each one. Or you could try passing the NodeList directly to println(), and see what it prints. 🙂