I am trying to use a regex to parse out the key-value pairs of command-line switches. Here’s what I’ve got so far:
(?<=(^-{1,2}| -{1,2}|^/| /))(?<name>[\w]+)[ :"]*(?<value>[\w.?=&+ :/|\\]*)(?=[ "]|$)
It seems to parse everything properly… almost. If there are hyphens in the value, it craps out on the match. How do I tweak this to work on all the test examples below?
test examples (all valid):
-s -i:C:\Users\Fozzie\Workspace\TheProject\TheProject-Stack-1_5\db\DB Scripts\ -h:local:host -d:theDB
-o:"C:\temp\db\" -s -r -host:localhost --d theDB
-s -i:"C:\Users\Fozzie\Workspace\TheProject\TheProject-Stack-1_5\db\DB -Scripts\" -h:localhost -d:theDB
-s -d http://www.theproject.com -h:localhost -d:theDB
-i:"C:\Users\Fozzie\Workspace\TheProject\TheProject_Stack_1_5\db\DB Scripts\" --h:localhost -d:theDB
-h:localhost -i:"C:\Users\Fozzie\Workspace\TheProject\TheProject-Stack_1_5\db\DB Scripts\" -d:theDB
--d theDB -o:"C:\temp\db\" -host=local-host -r
The regex fails when the value part is something like
"C:\Users\Fozzie\Workspace\TheProject\TheProject-Stack-1_5\db\DB Scripts\"
or "local-host" due to the hyphens therein. It is seen as the start of a new switch.
PS: I don’t want to use a canned options library like getops. I’m interested in getting the regex right.
Thanks.
UPDATE: Sorry for the missing detail: this is a .NET regex.
.NET solution — speculative
This suggested .NET solution is just ‘suggested’; I don’t do .NET and have no way of testing it on any of my machines (is there a regex-test web site for .NET?). I’ve taken the working Perl solution, removed the
<mark>and<pad>parts that you’re not worried about, and the comments, and flattened it all onto one line on the assumption that .NET doesn’t have an option for legibility analogous to Perl’sxoption. You can still find 5 sets of parentheses corresponding to the 5 parts of the Perl regex. I’m assuming that(?:...)is a non-capturing group.I also assume that .NET provides some mechanism analogous to Perl’s
gmodifier that allows you to scan the string on a second (or subsequent) pass where it left off on the previous pass. Or that you can somehow determine where the end of the match was and resume the scan from there.Perl solution — validated
This is as good as I’ve managed to come up with using Perl regexes (tested with Perl 5.16.0 on Mac OS X 10.7.5).
The bulk of the script is not very interesting. The outer while loop reads the data section of the file (which is the material after the marker
__DATA__) one line at a time, prints it for validation, then repeatedly runs the regex on the line to find the components (the marker, the name, the padding, and the value), printing those out. The bulk of the data is what was provided in the question (thank you!). The last three lines of the data are extra compared to what was originally provided.All the excitement is in the regex. I’ve used Perl’s
/xmodifier to allow white space in the regex for readability. This means that white space is not significant unless preceded by a backslash or enclosed in square brackets (and there is no significant white space in this specimen). I’ve used the(?<name> ...)notation to identify the pieces as in the original, though the names could be omitted since they aren’t used. The(?# Was Cn)parts are pure comment.--?would be another, shorter way to write it.(?: ...)is a non-capturing grouping operator.-soption in the first position of the first line of the sample data doesn’t have a value). It consists of: either a string starting with something other than a dash, double quote or white space, followed by a non-greedy string of non-quotes; or a double quote, a string of non-quotes, and another double quote.The output is: