Could someone please explain why this is happening. I have simplified my problem by created a simple program, but see details about the problem I’m facing:
String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
"<title text=\"title1\">\n" +
" <comment id=\"comment1\">\n" +
" <data> abcd </data>\n" +
" <data> efgh </data>\n" +
" </comment>\n" +
" <comment id=\"comment2\">\n" +
" <data> ijkl </data>\n" +
" <data> mnop </data>\n" +
" <data> qrst </data>\n" +
" </comment>\n" +
"</title>\n";
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(xml)));
System.out.println(doc.getFirstChild().getNodeName());
System.out.println(doc.getFirstChild().getFirstChild().getNodeName());
The corresponding output it:
title
#text
Firstly, why can’t I get the comment node?
Secondly, why does the data node get interpreted as a #text node?
What would be the correct and simple way to get the required nodes. Please also note that the XML file is not fixed; I want an arbitrary solution. thanks.
EDIT:
I get a similar problem when using Xpath, see the code below:
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("/title/comment/data/text()");
NodeList result = (NodeList) expr.evaluate(msg.document(), XPathConstants.NODESET);
for(int i = 0; i < result.getLength(); i++)
System.out.println(result.item(i).getNodeName() + " : " + result.item(i).getNodeValue());
This gives the output:
#text : abcd
#text : efgh
#text : ijkl
#text : mnop
#text : qrst
The first node of the
titlenode is a text node containing the\nand the four spaces before the<comment>element starts.To get the comment node, ask its parent for its second node, or for its first element by tag name “comment”. You may also loop through the childs and return the first node of type
ELEMENT_NODE.<data>is an element node containing a text node. The value of the text node is ” abcd “.