Application flow
An input file consists of multiple logical documents.
- Extract one input logical document.
- Parse the elements within the document.
- Build an xml out of the input logical document.
- Write that document back to a physical file.
What would be a good way to reduce memory needs?
Right now, I save all the logical documents in a physical file in an ArrayList so that I do all the I/O once. But when I write a single logical document to stream after processing, it hits a Java heap space error after 20,000 logical documents. The input logical document count is about 100,000 and I was looking for an efficient way to process & write all of these docs.
Don’t keep everything in memory. Instead, read from and write to disk as you go. For instance:
(You’ll obviously want to add error handling)
That way, only one logical document is in memory at any given time.