I try to create a progress bar that displays the progress of a parser reading a text file.
To do that, I read the bytes of the file using fileSize = FileInfo(file).Length and in every iteration, I sum up the bytes of the current line using
sum += reader.CurrentEncoding.GetByteCount(currentLine)
I assumed that when I’m finished reading the whole file, sum should be equal to fileSize.
But, that’s not the case. sum is always several thousand bytes lower than fileSize. Why is this? How can I correctly create a progress that displays how many of the file I have already parsed?
There can be several reasons for that, most likely it’s due to the encoding. I’m not talking about things like UTF-8 encoding or similar ones, but line endings too.
For example, a text file might contain two lines with two words:
Counting just the bytes of each line, you’d end up with 10 bytes (5 characters each). However, depending on the file encoding, it will have a line ending marked by
\r,\r\nor\n, which is not contained in the length of the line.Depending on your file size you could either read the whole file into a
String[]or instead use the stream’s current position as a progress indicator.