I need to parse an xml file, no matter the tags in it, and

Question

0

Asked: June 6, 20262026-06-06T21:17:18+00:00 2026-06-06T21:17:18+00:00

I need to parse an xml file, no matter the tags in it, and

0

I need to parse an xml file, no matter the tags in it, and read the text of all its leaves (text element only). I’m using StAX but it seems there is no way to know in advance that an element is text only (so getElementText throws an exception for not leave element).
So I decided to use a filter, filtering only tag elements, and iterate throw the document in this way:

InputStream in = null;
    try {
        in = new FileInputStream("file.xml");
        DatiEstratti de = DatiEstratti.getInstance();

        // Processamento ad eventi
        XMLInputFactory factory = (XMLInputFactory) XMLInputFactory.newInstance();

        XMLEventReader eventReader = factory.createXMLEventReader(in);
        // usa il filtro per filtrare solo i tag element
        eventReader = factory.createFilteredReader(eventReader, new ElementOnlyFilter());

        while (eventReader.hasNext()) {

            XMLEvent event = eventReader.nextEvent();

            if (event.getEventType() == XMLStreamConstants.START_ELEMENT) {
                StartElement startElement = event.asStartElement();

                XMLEvent peekEvent = eventReader.peek();
                if(peekEvent.isEndElement()){
                    // questa è la prima volta che viene fatto un pop
                    // quindi è una foglia.
                    // recupera il dato.
                    String value = eventReader.getElementText();

                    logger.info("dato : " + value);
                }


                String nome = startElement.getName().getLocalPart();
                String prefix = startElement.getName().getPrefix();
                if (prefix != null) {
                    nome = prefix + ":" + nome;
                }
                de.push(nome);
                logger.info("push : " + de.stampaPercorso());



            } else if ((event.getEventType() == XMLStreamConstants.END_ELEMENT)) {

                de.pop();
                logger.info("pop : " + de.stampaPercorso());
                if (0 > de.nLivelliPercorso()) {
                    break;
                }
            }
            //handle more event types here...
        }

… where the filter is:

public class ElementOnlyFilter implements EventFilter, StreamFilter {

/* implementation of EventFilter interface */
@Override
public boolean accept(XMLEvent event) {
    return acceptInternal(event.getEventType(  ));
}

/* implementation of StreamFilter interface */
@Override
public boolean accept(XMLStreamReader reader) {
    return acceptInternal(reader.getEventType(  ));
}

/* internal utility method */
private boolean acceptInternal(int eventType) {
    return eventType == XMLStreamConstants.START_ELEMENT
            || eventType == XMLStreamConstants.END_ELEMENT;
}

}

The problem is that I got the following exception when a leave is found:

    javax.xml.stream.XMLStreamException: ParseError at [row,col]:[3,42]
Message: parser must be on START_ELEMENT to read next text
    at com.sun.xml.internal.stream.XMLEventReaderImpl.getElementText(XMLEventReaderImpl.java:114)
    at javax.xml.stream.util.EventReaderDelegate.getElementText(EventReaderDelegate.java:88)
    at xmlparser.XmlParser.main(XmlParser.java:63)

I wonder way. Is there a fault in this code? I thought peek() does not change the reader so getElementText() should be called by a start element.
Is there another way to accomplish my goal?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T21:17:20+00:00

Firstly, if you filter to include only start and end element events then you won’t see the text contained inside your leaf nodes at all. I would use a different approach, with an unfiltered stream, like this:

XMLEventReader eventReader = factory.createXMLEventReader(in);
StringBuilder content = null;
while(eventReader.hasNext()) {
  XMLEvent event = eventReader.nextEvent();
  if(event.isStartElement()) {
    // other start element processing here
    content = new StringBuilder();
  } else if(event.isEndElement()) {
    if(content != null) {
      // this was a leaf element
      String leafText = content.toString();
      // do something with the leaf node
    } else {
      // not a leaf
    }
    // in all cases, discard content
    content = null;
  } else if(event.isCharacters()) {
    if(content != null) {
      content.append(event.asCharacters().getData());
    }
  }
  // other event types here
}

The trick is the content = null at the end of the end element section – on entry to the if(event.isEndElement()) block if content is non-null then you know there have been no intervening end element events between this one and its corresponding start tag, i.e. it’s a leaf node.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to parse an xml file, no matter the tags in it, and

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply