I have a huge XML files up to 1-2gb, and obviously I can’t parse

Question

0

Asked: May 15, 20262026-05-15T21:07:06+00:00 2026-05-15T21:07:06+00:00

I have a huge XML files up to 1-2gb, and obviously I can’t parse

0

I have a huge XML files up to 1-2gb, and obviously I can’t parse the whole file at once, I’d have to split it into parts then parse the parts and do whatever with them.

How can I count number of a certain node? So I can keep track on how many parts do I need to split the file. Is there a maybe better way to do this? I’m open to all suggestions thank you

Question update:

Well I did use STAX, maybe the logic I’m using it for is wrong, I’m parsing the file, then for each node I’m getting the node value and store it inside string builder. Then in another method I go trough stringbuilder and edit the output. Then I write that output to the file. I can do no more than 10000 objects like this.

Here is the exception I get :

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at com.sun.org.apache.xerces.internal.util.NamespaceSupport.<init>(Unkno
wn Source)
        at com.sun.xml.internal.stream.events.XMLEventAllocatorImpl.setNamespace
Context(Unknown Source)
        at com.sun.xml.internal.stream.events.XMLEventAllocatorImpl.getXMLEvent(
Unknown Source)
        at com.sun.xml.internal.stream.events.XMLEventAllocatorImpl.allocate(Unk
nown Source)
        at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(Unknown Sour
ce)
        at com.sun.org.apache.xalan.internal.xsltc.trax.StAXEvent2SAX.bridge(Unk
nown Source)
        at com.sun.org.apache.xalan.internal.xsltc.trax.StAXEvent2SAX.parse(Unkn
own Source)
        at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transfor
mIdentity(Unknown Source)
        at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transfor
m(Unknown Source)
        at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transfor
m(Unknown Source)

Actually I think my whole approach is wrong, what I’m actually trying convert xml files into CSV samples. Here is how I do it so far :

Read/parse xml file
For each element node get text node value
Open stream write it to file(temp), for n nodes then flush and close stream
Then open another stream read from temp, use commons strip utils and some other stuff to create proper csv output then write it to csv file

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T21:07:06+00:00

The SAX or STAX API would be your best bet here. They don’t parse the whole thing at once, they take one node at a time and let your app process it. They’re good for arbitrarily large documents.

SAX is the older API, and works on a push model, STAX is newer and is a pull parser, and is therefore rather easier to use, but for your requirements, either one would be fine.

See this tutorial to get you started with STAX parsing.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a huge XML files up to 1-2gb, and obviously I can’t parse

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply