I have the following data that looks like this for example:
34 foo
34 bar
34 qux
62 foo1
62 qux
78 qux
These are sorted based on the first column.
What I want to do is to process lines that starts with 34, but I also want the file iteration to quit after it finds no more 34s, without having have to scan through whole file. How would I do this?
The reason is because the number of lines to be processed is very large (~ 10^7). And those that start with 34 are only around 1-10% of it.
I am aware that I can grep the lines and output it into another file, but this is too tedious and creates more disk space consumption.
This code illustrates my failed attempt using ‘continue’:
#include <iostream> #include <vector> #include <fstream> #include <sstream> using namespace std; int main () { string line; ifstream myfile ('mydata.txt'); vector<vector<string> > dataTable; if (myfile.is_open()) { while (! myfile.eof() ) { stringstream ss(line); int FirstCol; string SecondCol; if (FirstCol != 34) { continue; } // This will skip those other than 34 // but will still iterate through all the file // until the end. // Some processing to FirstCol and SecondCol ss >> FirstCol >> SecondCol; cout << FirstCol << '\t << SecondCol << endl; } myfile.close(); } else cout << 'Unable to open file'; return 0; }
Based on the assumption that the file is sorted by FirstCol, use a state variable that indicates whether or not you have found the first one. Once you have found the first one, as soon as you find a column that is != 34, you can break out of the loop.
For example, suppose your data is now:
…this code will do what you want: