I’m looking for some general advice on the most efficient way to go about creating a data-trawling routine. I have a basic knowledge of C++.
I need to create a routine to search through a text file which has the following format (example):
4515397 404.4 62.5 1607.0 2.4 0.9 ...
4515398 404.4 62.3 1607.0 3.4 1.2 ...
4515399 404.4 62.2 1608.0 4.6 0.8 ...
4515400 405.1 62.2 1612.0 5.8 0.2 ...
4515401 405.9 62.2 1615.0 6.9 -0.8 ...
4515402 406.8 62.2 1617.0 8.0 -2.7 ...
4515403 406.7 62.1 1616.0 9.0 -5.3 ...
In the above example, I am interested in exporting the average values of columns 2 and 3, when columns 5 and 6 are both less than 4. I am not actually interested in the values in columns 1, 4 or 7 (the ellipses are exactly how they appear in the file itself).
To further complicate matters, occasionally random strings of text appear in the file, like this (these can be thrown away):
4522787 429.6 34.4 2024.0 . . ...
4522788 429.9 34.2 2022.0 . . ...
4522789 429.9 34.1 2022.0 . . ...
EFIX R 4522633 4522789 157 427.9 36.8 2009
4522790 429.3 34.2 2021.0 . . ...
END 4522791 SAMPLES EVENTS RES 23.91 23.82
MSG 4522799 TRIAL_RESULT 0
MSG 4522799 TRIAL OK
Finally, each text file contains five sets of data in which I intend on averaging up the values. Each of these 5 data sets are bounded by lines like this:
MSG 4502281 START_GRAB
and
MSG 4512283 END_GRAB
Everything outside these bounds can be thrown away.
So, as a relatively inexperienced programmer, I’m starting to look at the most efficient ways of achieving objectives. What would be my best approach; i.e. is C++ needlessly complicated for this sort of task? Perhaps there is even a utility already available that can do this sort of data-trawling?
It just occurs to me now that I could potentially use a Microsoft Excel script to do this for me. I’d like to know any thoughts on this.
I’d start with the naive approach and see how far I’d get: