I currently do this, and the conversion to std::string at the end take 98% of the execution time. There must be a better way!
std::string
file2string(std::string filename)
{
std::ifstream file(filename.c_str());
if(!file.is_open()){
// If they passed a bad file name, or one we have no read access to,
// we pass back an empty string.
return "";
}
// find out how much data there is
file.seekg(0,std::ios::end);
std::streampos length = file.tellg();
file.seekg(0,std::ios::beg);
// Get a vector that size and
std::vector<char> buf(length);
// Fill the buffer with the size
file.read(&buf[0],length);
file.close();
// return buffer as string
std::string s(buf.begin(),buf.end());
return s;
}
Being a big fan of C++ iterator abstraction and the algorithms, I would love the following to be the fasted way to read a file (or any other input stream) into a
std::string(and then print the content):This certainly is fast for my own implementation of IOStreams but it requires a lot of trickery to actually get it fast. Primarily, it requires optimizing algorithms to cope with segmented sequences: a stream can be seen as a sequence of input buffers. I’m not aware of any STL implementation consistently doing this optimization. The odd use of
std::skipwsis just to get reference to the just created stream: thestd::istreambuf_iterator<char>expects a reference to which the temporary file stream wouldn’t bind.Since this probably isn’t the fastest approach, I would be inclined to use
std::getline()with a particular “newline” character, i.e. on which isn’t in the file:This assumes that the file doesn’t contain a null character. Any other character would do as well. Unfortunately,
std::getline()takes achar_typeas delimiting argument, rather than anint_typewhich is what the memberstd::istream::getline()takes for the delimiter: in this case you could useeof()for a character which never occurs (char_type,int_type, andeof()refer to the respective member ofchar_traits<char>). The member version, in turn, can’t be used because you would need to know ahead of time how many characters are in the file.BTW, I saw some attempts to use seeking to determine the size of the file. This is bound not to work too well. The problem is that the code conversion done in
std::ifstream(well, actually instd::filebuf) can create a different number of characters than there are bytes in the file. Admittedly, this isn’t the case when using the default C locale and it is possible to detect that this doesn’t do any conversion. Otherwise the best bet for the stream would be to run over the file and determine the number of characters being produced. I actually think that this is what would be needed to be done when the code conversion could something interesting although I don’t think it actually is done. However, none of the examples explicitly set up the C locale, using e.g.std::locale::global(std::locale("C"));. Even with this it is also necessary to open the file instd::ios_base::binarymode because otherwise end of line sequences may be replaced by a single character when reading. Admittedly, this would only make the result shorter, never longer.The other approaches using the extraction from
std::streambuf*(i.e. those involvingrdbuf()) all require that the resulting content is copied at some point. Given that the file may actually be very large this may not be an option. Without the copy this could very well be the fastest approach, however. To avoid the copy, it would be possible to create a simple custom stream buffer which takes a reference to astd::stringas constructor argument and directly appends to thisstd::string:At least with a suitably chosen buffer I would expect the version to be the fairly fast. Which version is the fastest will certainly depend on the system, the standard C++ library being used, and probably a number of other factors, i.e. you want to measure the performance.