I have a program that reads each line of file, extracting data according to specific format, defined by a regular expression. Instead of calling Match() multiple times against each line in the file, I could call Match() against the entire contents of the file. Which is a more efficient solution?
The latter choice would require the RegexOptions.Multiline option.
Update:
The file is specified by the end-user so it could be large (~37000 lines, ~2MB). It is not necessary for every line to contain a valid entry.
The regular expression I’m using is ^\s*(OPTL_\w*)\s*=>\s*(\d+)\s*$. For example, this would match the a line consisting of the text OPTL_Example => 123, but would not match a line consisting of the text FooBar => 999.
So depends on if you are optimizing for speed or stability.
If this is an end user app and don’t have control of file size or memory then I would take the safe route and read line by line to protect memory. Clearly build the regex outside the loop so you are just calling .Match in the loop. ReadLine is pretty fast.
Could set up some parallel processing so it is reading the next line while it is performing the parse. But that simple regex would be so fast not sure it would be faster. Line at a time or entire file the disk IO to read the file is most likely the slowest operation.
If this is a server app with limited distribution and speed it critical then read it all in.