My basic Java problem is this: I need to read in a file by chunks, then reverse the order of the chunks, then write that out to a new file. My first (naive) attempt followed this approach:
- read a chunk from the file.
- reverse the bytes of the chunk
- push the bytes one at a time to the front of a results list
- repeat for all chunks
- write result list to new file.
So this is basically a very stupid and slow way to solve the problem, but generates the correct output that I am looking for. To try to improve the situation, I change to this algorithm:
- read a chunk from the file
- push that chunk onto the front of a list of arrays
- repeat for all chunks
- foreach chunk, write to new file
And to my mind, that produces the same output. except it doesn’t and I am quite confused. The first chunk in the result file matches with both methods, but the rest of the file is completely different.
Here is the meat of the Java code I am using:
FileInputStream in;
FileOutputStream out, out2;
Byte[] t = new Byte[0];
LinkedList<Byte> reversed_data = new LinkedList<Byte>();
byte[] data = new byte[bufferSize];
LinkedList<byte[]> revd2 = new LinkedList<byte[]>();
try {
in = new FileInputStream(infile);
out = new FileOutputStream(outfile1);
out2 = new FileOutputStream(outfile2);
} catch (FileNotFoundException e) {
e.printStackTrace();
return;
}
while(in.read(data) != -1)
{
revd2.addFirst(data);
byte[] revd = reverse(data);
for (byte b : revd)
{
reversed_data.addFirst(b);
}
}
for (Byte b : reversed_data)
{
out.write(b);
}
for (byte[] b : revd2)
{
out2.write(b);
}
At http://pastie.org/3113665 you can see a complete example program (a long with my debugging attempts). For simplicity I am using a bufferSize that divides evenly the size of the file so all chunks will be the same size, but this won’t hold in the real world. My question is, WHY don’t these two methods generate the same output? It’s driving me crazy because I can’t grok it.
You’re constantly overwriting the data you’ve read previously.
You’re adding the same object repeatedly to the list
revd2, so each list node will finally contain a reference todatafilled with the result of the lastread. I suggest replacing that withrevd2.addFirst(data.clone()).