I am trying to read a large text file (14MB), each line into a list of strings, then get the distinct strings out of it, then write it back to another text file, I use the following code:
static void removeDuplicates(string filename)
{
//Reading from the file
Console.WriteLine("Reading from the file....");
StreamReader sr = new StreamReader(filename);
List<string> namesList = new List<string>();
while (!sr.EndOfStream)
{
namesList.Add(sr.ReadLine());
}
//Getting the distinct list
namesList=namesList.Distinct().ToList<string>();
Console.WriteLine("Writing to the new file");
//writing back to the file
StreamWriter sw = new StreamWriter(filename + "_NoDuplicates",false);
for (int i = 0; i < namesList.Count; i++)
{
sw.Write(namesList[i] + "\r\n");
}
}
The problem is the streamWriter always stops writing after certain number of lines, always stops writing at the same place
I made sure that the List contents are right, and that the loop goes through all the items in the list, it’s just the streamWriter problem.
The list contains 1048577 items before the Distinct(), and 880829 after the Distinct();
The streamWriter stops writing in the middle of the string number 880805 and doesn’t write anything after that, it even stops in the middle of a word !
Why is that happening, what am I doing wrong?
If you’re not getting an error then my guess is that the last bit of the file is still buffered. Try adding a call to
sw.Flush()to the end of your method.And, of course, you need to close the stream, which should flush the buffer anyway.
Explanation
The
StreamWriterinternally uses a buffer. Every time you callWrite()the data is actually written to the buffer in memory. When the buffer fills up it gets flushed to the disk.The problem you were seeing is because the last few lines of the file you’re writing didn’t fill up the buffer, so there was no trigger to flush the buffer to disk. It always occurs at the same point in the file because that is the last whole-number of multiple of the size of the buffer. By closing the stream you cause any remaining data to be flushed to disk.