I need to process a large text file (600 MB approximately) in order to format it correctly, writing the formatted output to a new text file. The problem is that writing the content into the new file stops at about 6.2 MB. Here is the code:
/* Analysis of the text in fileName to see if the lines are in the correct format
* (Theme\tDate\tTitle\tDescription). If there are lines that are in the incorrect format,
* the method corrects them.
*/
public static void cleanTextFile(String fileName, String destFile) throws IOException {
OutputStreamWriter writer = null;
BufferedReader reader = null;
try {
writer = new OutputStreamWriter(new FileOutputStream(destFile), "UTF8");
} catch (IOException e) {
System.out.println("Could not open or create the file " + destFile);
}
try {
reader = new BufferedReader(new FileReader(fileName));
} catch (FileNotFoundException e) {
System.out.println("The file " + fileName + " doesn't exist in the folder.");
}
String line;
String[] splitLine;
StringBuilder stringBuilder = new StringBuilder("");
while ((line = reader.readLine()) != null) {
splitLine = line.split("\t");
stringBuilder.append(line);
/* If the String array resulting of the split operation doesn't have size 4,
* then it means that there are elements of the news item missing in the line
*/
while (splitLine.length != 4) {
line = reader.readLine();
stringBuilder.append(line);
splitLine = stringBuilder.toString().split("\t");
}
stringBuilder.append("\n");
writer.write(stringBuilder.toString());
stringBuilder = new StringBuilder("");
writer.flush();
}
writer.close();
reader.close();
}
I’ve already looked for answers, but the problem is usually related to the fact that the writer is not being closed or the absence of the flush() method. Therefore, I’m thinking that the problem is in the BufferedReader. What am I missing?
Look at this loop:
If you ever end up with more than 5 items in
splitLine, you’ll just keep reading data forever… you won’t even notice when you’ve reached the end of the file, as you’ll just keep appendingnullto theStringBuilder. I don’t know whether this is what’s happening (we don’t know what your data looks like) but it’s certainly feasible, and you should guard against it.(You should also use a
try/finallyblock for closing resources, but that’s a separate matter.)