I’m very unfamiliar with Regular Expressions and I need a way to identify a subset of a matched string.
I asked a question previously about how to parse a string to extract date range values. One of the answers I received was very useful and pretty much gave me everything I needed to solve the problem at hand.
Part of the answer was this regular expression:
string pattern = @"\b(?<Year1>\d{4})(-(?<Year2>\d{2,4}))?\b";
This pattern allows me to identify the first and second year substrings in the string I’m comparing, with <Year1> and <Year2>, and in code all i need to do is:
searchTermMatch.Groups["Year1"].Value
However, I now need to identify the first part of the string. So if the string is
ThingOne ThingTwo 2006-2007 S12 RP
I need to be able to isolate “ThingOne ThingTwo” (which are only alphabetical characters – no numbers) the same as I can for “2006” & “2007”.
I’ve tried changing the pattern to
string pattern = @"\b(<FirstPart>?<Year1>\d{4})(-(?<Year2>\d{2,4}))?\b";
but that didn’t work.. so I’m looking to see if somebody could point out how I can achieve the result I need? Thanks.
The syntax works because matches in Regexps are noted by brackets. The naming syntax (which is BTW not supported in all languages) is
(?<name_of_match>pattern). So here we get three named matches:.+?= any character repeated more then once but not more times then necessary\d{4}= any digit character four times\d{2,4}= any digit repeated from two to four timesAlso notice the added
^character at the beginning – it means start at the beginning of the line.