I’m not sure how to specifically deal with this(new to Java). Basically I have a program that generates a lot of data thats beyond my memory(for example, its 10 gigs data and I have 4 gigs of ram). I decided to fork a thread that takes the data and writes it to disk, although I know disk writes could never keep up with the process generating it, I was hoping my application can be bound to how quickly I can write to disk. But after a while I get heap outofmemory errors.
Here’s parts I think are relevant:
All data is to be written is put in this variable:
private static Queue<short[]> result = new LinkedList <short[]> ();
Here’s the part that saves to file:
static class SaveToFile extends Thread {
public void run() {
FileWriter bw = null;
try {
bw = new FileWriter("output.csv");
Thread.sleep(500); //delay the start so the queue can have some data
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("size of results during execution is " + result.size());
while(!result.isEmpty()) {
short[] current = result.poll();
try {
bw.write(Arrays.toString(current) + "," + "\n");
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
try {
bw.flush();
bw.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("file writing is done");
}
}
I’m not sure what I’m doing wrong, do I need to block the result’s queue at a certain size so my process stops writing to it? or am I doing something wrong with the writing to file, I am showing the non-buffered version but I have tried bufferedWriter with the same result? I have observed that while the program is running the file size is 0, only once it crashes it seems to write..is it holding this in memory even without bufferedWriter and could that be causing the memory issue?
My idea was that as the SaveToFile thread clears the queue it makes more room for the other process to continue to write to it(these are the only threads I’m running, the main program and SaveToFile).
Yes, you do. The producer generating data faster than it can be written out is the most likely cause of your process running out of memory.
Another issue is that
LinkedListis not synchronized, so you need to use locking when using aLinkedListto pass data between threads.To limit the capacity, you can use
ArrayBlockingQueueorLinkedBlockingQueue. As an added bonus, both are thread-safe and thus won’t require external synchronization.Finally, if your code is I/O-bound, as it appear to be, you will probably get relatively little benefit from splitting it into two threads. This is worth bearing in mind, since it could be that you’re introducing all this extra complexity for little or no benefit.