I’m trying to parse a text file that has a heading and the body. In the heading of this file, there are line number references to sections of the body. For example:
SECTION_A 256
SECTION_B 344
SECTION_C 556
This means, that SECTION_A starts in line 256.
What would be the best way to parse this heading into a dictionary and then when necessary read the sections.
Typical scenarios would be:
- Parse the header and read only section SECTION_B
- Parse the header and read fist paragraph of each section.
The data file is quite large and I definitely don’t want to load all of it to the memory and then operate on it.
I’d appreciate your suggestions. My environment is VS 2008 and C# 3.5 SP1.
Well, obviously you can store the name + line number into a dictionary, but that’s not going to do you any good.
Well, sure, it will allow you to know which line to start reading from, but the problem is, where in the file is that line? The only way to know is to start from the beginning and start counting.
The best way would be to write a wrapper that decodes the text contents (if you have encoding issues) and can give you a line number to byte position type of mapping, then you could take that line number, 256, and look in a dictionary to know that line 256 starts at position 10000 in the file, and start reading from there.
Is this a one-off processing situation? If not, have you considered stuffing the entire file into a local database, like a SQLite database? That would allow you to have a direct mapping between line number and its contents. Of course, that file would be even bigger than your original file, and you’d need to copy data from the text file to the database, so there’s some overhead either way.