Let me preface this by saying that I’m pretty new to Java.
I have a file that contains a single line. The size of the file is about 200MB. I need to insert a newline character after every 309th character. I believe I have the code to do this properly, but I keep running into memory errors. I’ve tried increasing the heap space to no avail.
Is there a less memory-intensive way of handling this?
BufferedReader r = new BufferedReader(new FileReader(fileName));
String line;
while ((line=r.readLine()) != null) {
System.out.println(line.replaceAll("(.{309})", "$1\n"));
}
Your code has two problems:
You’re loading the entire file into memory at once, assuming it is a single line so you’ll need at least 200MB of heap space for that; and
It’s a horribly inefficient way of adding newlines to use a regex like that. The straightforward code solution will be an order of magnitude faster.
Both of these problems are easily fixed.
Use a
FileReaderandFileWriterto load 309 characters at a time, append a newline and write those out.Update: added a test of both character-by-character and buffered reading. The buffered reading actually adds a lot of complexity because you need to cater for the possible (but typically exceedingly rare) situation where a
read()returns less bytes than you ask for and there are still bytes to read.Firstly the simple version:
And the “block” version:
And a method to create a test file:
These all assume:
Running this test:
Gives this result (Intel Q9450, Windows 7 64bit, 8GB RAM, test run on 7200rpm 1.5TB drive):
Conclusion: the SHA1 hash verification is really expensive, which is why I ran versions with and without. Basically after warm up the “efficient” version is only about 2x as fast. I guess by this time the file is effectively in memory.
If I reverse the order of the block and char reads, the result is:
It’s interesting that the character-by-character version takes a far bigger initial hit on the first read of the file.
So, as per usual, it’s a choice between efficiency and simplicity.