Node.getTextContent() returns the text content of the current node and its descendants. is there

Question

0

Editorial Team

Asked: June 10, 20262026-06-10T12:20:50+00:00 2026-06-10T12:20:50+00:00

Node.getTextContent() returns the text content of the current node and its descendants. is there

0

Node.getTextContent() returns the text content of the current node and its descendants.

is there a way to get text content of the current node, not the descendant’s text.

Example

<paragraph>
    <link>XML</link>
    is a 
    <strong>browser based XML editor</strong>
    editor allows users to edit XML data in an intuitive word processor.
</paragraph>

expected output

paragraph = is a editor allows users to edit XML data in an intuitive word processor.
link = XML
strong = browser based XML editor

i tried below code

String str =            "<paragraph>"+
                            "<link>XML</link>"+
                            " is a "+ 
                            "<strong>browser based XML editor</strong>"+
                            "editor allows users to edit XML data in an intuitive word processor."+
                        "</paragraph>";

        org.w3c.dom.Document domDoc = null;
        DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder docBuilder;

        try {
            docBuilder = docFactory.newDocumentBuilder();
            ByteArrayInputStream bis = new ByteArrayInputStream(str.getBytes());
            domDoc = docBuilder.parse(bis);         
        } catch (ParserConfigurationException e1) {         
            e1.printStackTrace();
        } catch (SAXException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }       

        DocumentTraversal traversal = (DocumentTraversal) domDoc;
        NodeIterator iterator = traversal.createNodeIterator(
                domDoc.getDocumentElement(), NodeFilter.SHOW_ELEMENT, null, true);

        for (Node n = iterator.nextNode(); n != null; n = iterator.nextNode()) {           
            String tagname = ((Element) n).getTagName();
            System.out.println(tagname + "=" + ((Element)n).getTextContent());
        }

but it gives the output like this

paragraph=XML is a browser based XML editoreditor allows users to edit XML data in an intuitive word processor.
link=XML
strong=browser based XML editor

note the paragraph element contains the text of link and strong tag, which i dont want.
please suggest some ideas?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T12:20:52+00:00

What you want is to filter children of your node <paragraph> to only keep ones with node type Node.TEXT_NODE.

This is an example of method that will return you the desired content

public static String getFirstLevelTextContent(Node node) {
    NodeList list = node.getChildNodes();
    StringBuilder textContent = new StringBuilder();
    for (int i = 0; i < list.getLength(); ++i) {
        Node child = list.item(i);
        if (child.getNodeType() == Node.TEXT_NODE)
            textContent.append(child.getTextContent());
    }
    return textContent.toString();
}

Within your example it means:

String str = "<paragraph>" + //
        "<link>XML</link>" + //
        " is a " + //
        "<strong>browser based XML editor</strong>" + //
        "editor allows users to edit XML data in an intuitive word processor." + //
        "</paragraph>";
Document domDoc = null;
try {
    DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
    ByteArrayInputStream bis = new ByteArrayInputStream(str.getBytes());
    domDoc = docBuilder.parse(bis);
} catch (Exception e) {
    e.printStackTrace();
}
DocumentTraversal traversal = (DocumentTraversal) domDoc;
NodeIterator iterator = traversal.createNodeIterator(domDoc.getDocumentElement(), NodeFilter.SHOW_ELEMENT, null, true);
for (Node n = iterator.nextNode(); n != null; n = iterator.nextNode()) {
    String tagname = ((Element) n).getTagName();
    System.out.println(tagname + "=" + getFirstLevelTextContent(n));
}

Output:

paragraph= is a editor allows users to edit XML data in an intuitive word processor.
link=XML
strong=browser based XML editor

What it does is iterating on all the children of a Node, keeping only TEXT (thus excluding comments, node and so on) and accumulating their respective text content.

There is no direct method in Node or Element to get only the text content at first level.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Node.getTextContent() returns the text content of the current node and its descendants. is there

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply