Possible Duplicate:
Writing a binary file in C++ very fast
I have a large number of unsigned 32 bit integers in memory (1.5 billion entries). I need to write them to a file and read them back into main memory.
Now, I do it using:
ofstream ofs;
ofs.open(filename);
for (uint64_t i = 0 ; i < 1470000000 ; i++)
ofs << integers << " " ;
and
ifstream ifs;
ifs.open(filename);
for (uint64_t i = 0 ; i < 1470000000 ; i++)
ifs >> integers ;
This takes a few minutes to execute. Can anybody help me, is there any library method to do it in a faster way? Or any suggestion, so I can run a performance test? Can anybody show me some simple C++ code that uses mmap for doing the above (on Linux)?
EDIT: EXAMPLE CASE
#include<iostream>
#include <stdint.h>
#include <cstdio>
#include <cstdlib>
#include <sstream>
using namespace std;
main()
{
uint32_t* ele = new uint32_t [100] ;
for(int i = 0; i < 100 ; i++ )
ele[i] = i ;
for(int i = 0; i < 100 ; i++ ){
if(ele[i] < 20)
continue ;
else
// write ele[i] to file
;
}
for(int i = 0; i < 100 ; i++ ){
if(ele[i] < 20)
continue ;
else
// read number from file
// ele[i] = number * 10 ;
;
}
std::cin.get();
}
The first thing to do is to determine where the time is going.
Formatting and parsing text isn’t trivial, and can take some
time, but so can the actual writing and reading, given the size
of the file. The second thing is to determine how “portable”
the data have to be: the fastest solution is almost certainly to
mmap(or its Windows equivalent) the array to the filedirectly, and never read or write. This doesn’t provide
a portable representation, however, and even upgrading the
compiler might make the data unreadable. (Unlikely for 32 bit
integers today, but it has happened in the past).
In general, if the time is going to reading and writing, you
will want to investigate using
mmap. If it is going toformatting and parsing, you will want to investigate some sort
of binary format—this could also help reading and writing
if it makes the resulting files smaller. The simplest binary
format, writing the values using the normal network standard,
requires no more than:
(Some error checking obviously needs to be added.)
If many of the integers are actually small, you could try some
variable length encoding, such as that used in Google Protocol
Buffers. If most of your integers are in the range -64…63,
this could result in a file only a quarter of the size (which
again, will improve the time necessary to read and write).