I have large log files which contains timestamps every one second.what I need is to cut a user defined part from this huge file and save it in another text file..i am confused since the fstream class can deal with a max file size of 2GB and reading all lines is time and memory disaster.
timestamp pattern : !<< dd.mm.yyyy hh:min:sec> every second and one per line .
one prof. guy suggested using LINQ and readline().
a sample of the file :
!<<14.12.2012 16:20:03>
some text some text some
some text some text some
some text some text some
!<<14.12.2012 16:20:04>
some text some text some
some text some text some
some text some text some
some text some text some
some text some text some
!<<14.12.2012 16:20:05>
some text some text some
!<<14.12.2012 16:20:06>
some text some text some
some text some text some
and so on till EOF.
ReadLine is not at all what you want to do… open a file reader… seek to the position you want, read the data out you want (into another file stream).
“ReadLine” has to actually read the data… whereas seeking (myStream.Position = whereIWantToGo) is basically instant.
You would handle this the same way you would a sorted database. A DB with 1,000,000 records only takes 20 “seek” operations to find… start halfway, too high? just saved 500,000 seeks… come back halfway… too high? just shaved off 250,000 more seeks… rinse, repeat.
If you find funny characters (bad encoding)
Per your email (btw – you should really continue to use S.O., not email – that way other people can benefit)… The answer is that you need to try different encoding types. Your file may not be encoded UTF8 (which is what my code below is expecting). So, use
new StreamReader("MyLogFile.txt", Encoding.ASCII), or some other encoding until it works for you.C# console app that should get you started
Disclaimer… this code is nasty, and might have bugs where there is an infinite loop :)… but, here is a console app that should work for you.