I am working on this project where I need to read in a lot of data from .dat files and use the data to perform simulations. The data in my .dat file looks as follows:
DeviceID InteractingDeviceID InteractionStartTime InteractionEndTime
1 2 1101 1105
1,2 1101 and 1105 are tab delimited and it means Device 1 interacted with Device 2 at 1101 ms and ended the interaction at 1105ms.
I have a trace data sets that compile thousands of such interactions and my job is to analyze these interactions.
The first step is to parse the file. The language of choice is C++. The approach I was thinking of taking was to read the file, for every line that’s read create a Device Object.
This Device object will contain the property DeviceId and an array/vector of structs, that will contain a list of all the devices the given DeviceId interacted with over the course of the simulation.The struct will contain the Interacting Device Id, Interaction Start Time and Interaction End Time.
I have a two fold question here:
-
Is my approach correct?
-
If I am on the right track, how do I rapidly parse these tab delimited data files and create Device objects without excessive memory overhead using C++?
A push in the right direction will be much appreciated.
Thanks
Your approach seems to be correct given the information you’ve provided.
I’m assuming you’d be creating a class something like:
with
At that point, you should be able to read in the file, one line at a time, and pull out the data.
Code is untested and for illustration purposes only, but you can get the idea. The trick would be writing the
find_device_by_idfunction (would return a pointer to thedeviceobject with a matchingidfield). This shouldn’t require too heavy of a memory overhead per input line; if your input files are huge, you may not be able to store the data in memory and may have to store in a database instead.