I have XML file (GML file) which may contain 1GB up file size that need to split into several xml files based on the content.
Basically, I need a parser which doesn’t load the content into memory. must be run in 32bit. target OS is Windows XP UP.
I am thinking of the following options:
-
extending org.xml.sax.helpers.DefaultHandler
-
use Xerces
-
use VTD-XML (if doesn’t load the content into memory; i know Huge classes of VTD-XML but it can be used only 64bit platform; if there’s a way to use VTD-XML with 32bit in a 2GB file size)
Any guidance on the right direction is appreciated.
If your splitting algorithm doesn’t need much context (i.e. there’s no need for a DOM or a partial DOM), then SAX (i.e. implementing a
DefaultHandler) is certainly one of the simplest approaches and doesn’t add an external dependency.