I’ve always assumed it to be more efficient, when processing text files, to first read the contents (or part of it) into an std::string or char array, as — from my limited understanding — files are read from memory in blocks much larger than the size of a single character. However, I’ve heard that modern OS’s are often not actually directly reading from the file anyway, making my manually buffering the input little benefit.
Say I wanted to determine the number of a certain character in a text file. Would the following be inefficient?
while (fin.get(ch)) {
if (ch == 'n')
++char_count;
}
Granted, I guess it would depend on file size, but does anyone have any general rules about what’s the best approach?
No, your code is efficient. Files are intended to be read sequentially. Behind the scenes, a block of RAM is reserved in order to buffer the incoming stream of data. In fact, because you start processing data before the entire file has been read, your while loop ought to complete slightly sooner. Additionally, you can process a file far in excess of your computer’s main RAM without trouble.
Edit: To my surprise, Jerry’s number’s pan out. I would have assumed that any efficiencies gained by reading and parsing in chunks would be dwarfed by the cost of reading from a file. I’d really like to know where that time is being spent and how much lower the variation is when the file is not cached. Nevertheless, I have to recommend Jerry’s answer over this one, especially as he points out is that you really shouldn’t worry about it until you know you have a performance problem.