I’m working on a project that has me a bit over my head performance-wise. I’m tasked with reading large (50MB or so) files of particle coordinates and displaying them. I’d like to use C++ for this because I am learning it already.
The coordinate structure in the files are simple, there’s just alot (say a million or so):
1234.5667 5234.1566 //coordinate 1
8532.6123 5152.6612 //coordinate 2
....
Being a noob, I just want to read in the files line by line and store them in vectors, is this wrong? Maybe I should be reading in the whole file first (buffered?), and then parsing the values?
Working example:
clock_t c1 = clock();
vector<double> coords;
double coord;
ifstream fin("file.txt");
while(fin >> coord) {
coords.push_back(coord);
}
cout << "done. " << coords.size()/2 << " coords read.\n";
cout << "took " << (clock() - c1)/(double)CLOCKS_PER_SEC << " seconds." << endl;
And corresponding output on a 40MB file with 2 million coordinates:
done. 2000000 coords read.
took 1.74 seconds.
Which is fast in my mind, but I’m thinking my mind isn’t a good judge.
You might want to preallocate the vector using .reserve if you have an idea of how large the “average” file is.
Efficiency is a tricky game. Don’t play tricks early on, and design a good basic algorithm. If it’s not fast enough, you start looking at the IO routines, whether you’re creating any “extra” objects (explicitly or implicitly, especially if you’re passing parameters around).
In your example, you might want to do a second call to clock() before printing the summary output — get a slightly more accurate timing! 🙂