I am working on a project to parse out a text file. The file is output from networking equipment. The incoming string is anywhere from a few thousand to tens of thousands of lines long. There will be a variable number of entries with keywords like these:
fcN/N is up
Hardware is Fibre Channel, SFP is short wave laser w/o OFC (SN)
Port WWN is 20:52:00:0d:ec:ef:b0:40
Admin port mode is F, trunk mode is on
snmp link state traps are enabled
Port vsan is 10
fcipN is up
.....
port-channel-N is trunking
......
The N is a number. There will always be the ‘fcN/N’ entries, there may or may not be the other two. The ‘fcip’ and ‘port-channel’ entries will have similar status information after each one as the fcN/N entries. All entries of the same type will be grouped – there won’t be an fc followed by an fcip followed by another fc. Also as a general rule, all the fc entries are listed, then all the port-channel then all the fcip but I don’t want to assume that. At the moment I have about 7 different RegEx patterns I am looking for. I do this by examining each line in turn, however managing all those is cumbersome. I thought about splitting the string on newline and then some kind of LINQ select to get all of each of the 3 types of entries, but that assumes they are always grouped in the same order. I also thought about 3 monster regexes to match everything from one entry to the next, but my experience is those are tough to get working and almost unreadable. Another thing I thought of was first match the three keywords – fc or port-channel or fcip, then have an if statement that matches the patterns unique to those. That is still matching each line for all 3 patterns though.
To be clear, I have the Regex patterns working. I am looking for a more efficient way to do this than test each line for 6 0r 8 matches.
Any other ideas?
I have two thought:
(1) Your last approach of using if statements to first find the right regex to apply is like to be quite efficient. I’d recommend it.
(2) You can compose regex’s like this:
This makes it much more readable.
If you never want to find a match that spans lines you should split the file into lines first. That will improve efficiency because the regexes have smaller inputs and will backtrack less.
If your matches span multiple lines but they always start after a new-line, you can you can split the string into chunks first like this: