The usual way to read a file in C++ is this one:
std::ifstream file("file.txt", std::ios::binary | std::ios::ate);
std::vector<char> data(file.tellg());
file.seekg(0, std::ios::beg);
file.read(data.data(), data.size());
Reading a 1.6 MB file is almost instant.
But recently, I discovered std::istream_iterator and wanted to try it in order to code a beautiful one-line way to read the content of a file. Like this:
std::vector<char> data(std::istream_iterator<char>(std::ifstream("file.txt", std::ios::binary)), std::istream_iterator<char>());
The code is nice, but very slow. It takes about 2/3 seconds to read the same 1.6 MB file. I understand that it may not be the best way to read a file, but why is it so slow?
Reading a file in a classical way goes like this (I’m talking only about the read function):
- the istream contains a filebuf which contains a block of data from the file
- the read function calls sgetn from the filebuf, which copies the chars one by one (no memcpy) from the inside buffer to “data”‘s buffer
- when the data inside of the filebuf is entirely read, the filebuf reads the next block from the file
When you read a file using istream_iterator, it goes like this:
- the vector calls *iterator to get the next char (this simply reads a variable), adds it to the end and increases its own size
- if the vector’s allocated space is full (which happens not so often), a relocation is performed
- then it calls ++iterator which reads the next char from the stream (operator >> with a char parameter, which certainly just calls the filebuf’s sbumpc function)
- finally it compares the iterator with the end iterator, which is done by comparing two pointers
I must admit that the second way is not very efficient, but it’s at least 200 times slower than the first way, how is that possible?
I thought that the performance killer was the relocations or the insert, but I tried creating an entire vector and calling std::copy, and it’s just as slow.
// also very slow:
std::vector<char> data2(1730608);
std::copy(std::istream_iterator<char>(std::ifstream("file.txt", std::ios::binary)), std::istream_iterator<char>(), data2.begin());
You should compare apple-to-apple.
Your first code read unformatted binary data because you use the function member “read”. And not because you use std::ios_binary by the way, see http://stdcxx.apache.org/doc/stdlibug/30-4.html for more explication, but in short : “The effect of the binary open mode is frequently misunderstood. It does not put the inserters and extractors into a binary mode, and hence suppress the formatting they usually perform. Binary input and output is done solely by basic_istream<>::read() and basic_ostream<>::write()”
So your second code with istream_iterator read formatted text. It’s way slower.
If you want to read unformatted binary data, use istreambuf_iterator :
On my platform (VS2008), istream_iterator is about x100 slower than read(). istreambuf_iterator performs better, but still x10 slower than read().