I am trying to convert the following regular expression from Java to .NET:
(?i:(?:([^\d,]+?)\W+\b((?:CA|SD|SC|CT|DC)\b)?\W*)?(\d{5}(?:[- ]\d{3,4})?)?)
When I run a match against the following string:
Mountain View, CA 94043
using a Pattern and Matcher object in Java, it populates four groups with the values:
"Mountain View, CA 94043"
"Mountain View"
"CA"
"94043"
However, in .NET, there are two matches. The first match populates the four groups with these values:
"Mountain "(there is a space on the end of group 0)
"Mountain"
""
""
The second match populates the three groups with these values:
"View, CA 94043"
"View"
"CA"
"94043"
I also tried the expression using RegexBuddy using both the Java and .NET modes and in RegexBuddy, both modes work like the .NET version.
Thanks everyone!
Add
^to the beginning of your pattern, and add$to the end of it to match the beginning and end of the string, respectively. This will make the pattern match the entire string and produces your desired result:Since you didn’t restrict the pattern to be an exact match, as above, it found partial matches, especially since some of your groups are completely optional. Thus, it considers “Mountain” a match, then considers “View, CA 94043” as the next match.
EDIT: as requested in the comments, I’ll try to point out the differences between the Java and .NET regex approaches.
In Java the
matches()method returns true/false if the pattern matches the whole string. Thus it doesn’t require the pattern to be modified with boundary anchors or atomic zero-width assertions. In .NET there is no such equivalent method that will do this for you. Instead, you need to explicitly add either the^and$metacharacters, to match the start and end of the string or line, respectively, or the\Aand\zmetacharacters to do the same for the entire string. For a reference of .NET metacharacters check out this MSDN page. I’m not sure which set of anchors Java’smatches()uses, although this article suggests\Aand\zare used.Java’s
matches()returns a boolean, and .NET provides theRegex.IsMatch()method to do the same thing (apart from the already discussed difference of matching the entire string). The .NET equivalent of Java’sfind()method is theRegex.Match()method, which you can use in a loop to continue to find the next match. In addition, .NET offers aRegex.Matches()method that will do this for you, and returns a collection of successful matches. Depending on your needs and the input this might be fine, but for added flexibility you may want to checkMatch.Successin a loop and use theMatch.NextMatch()method to keep looking for matches (an example of this is available in theNextMatchlink).