I’m writing a Java application that reads in a comma separated text file, peforms some calculations on the data and writes the updated data to a new file. The input file contains a about 500 million rows, so I’m trying to scale the below as much as possible so that I don’t get an out of memory exception when I run it. Any ideas on how to improve the below?
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
public class CsvTest {
public void readFile() {
BufferedReader br = null;
BufferedWriter out = null;
try {
br = new BufferedReader(new FileReader("C:\\input.txt"));
FileWriter fstream = new FileWriter("C:\\output.txt");
out = new BufferedWriter(fstream);
String line = null;
while ((line = br.readLine()) != null) {
out.write(line + "\r\n");
}
}
catch (FileNotFoundException ex) {
System.err.println("Error: " + ex.getMessage());
}
catch (IOException ex) {
System.err.println("Error: " + ex.getMessage());
}
finally {
try {
if (br != null) {
br.close();
}
if(out != null){
out.close();
}
}
catch (IOException ex) {
System.err.println("Error: " + ex.getMessage());
}
}
}
public static void main(String[] args) {
CsvTest test = new CsvTest();
test.readFile();
}
}
Your code is pretty good, I mean you are streaming data from the input into the output holding only one line in memory, so it’s basically O(1) in terms of the memory requirements, you can’t get better than that I think.
Buffers in the buffered reader and the buffered writer are constant with negligible, relative to the size of multi-gb files, memory usage.
EDIT: And garbage collector should work fine collecting the unused data, at least my experience with it in the similar cases of data processing was pretty positive.