Main problem: ObjC can tell me there were six matches when my pattern is, @"\\b(\\S+)\\b", but when my pattern is @"A b (c) or (d)", it only reports one match, "c".
Solution
Here’s a function which returns the capture groups as an NSArray. I’m an Objective C newbie so I suspect there are better ways to do the clunky work than by creating a mutable array and assigning it at the end to an NSArray.
- (NSArray *)regexWithResults:(NSString *)haystack pattern:(NSString *)strPattern
{
NSArray *ar;
ar = [[NSArray alloc] init];
NSError *error = NULL;
NSArray *arTextCheckingResults;
NSMutableArray *arMutable = [[NSMutableArray alloc] init];
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:strPattern
options:NSRegularExpressionSearch error:&error];
arTextCheckingResults = [regex matchesInString:haystack
options:0
range:NSMakeRange(0, [haystack length])];
for (NSTextCheckingResult *ntcr in arTextCheckingResults) {
int captureIndex;
for (captureIndex = 1; captureIndex < ntcr.numberOfRanges; captureIndex++) {
NSString * capture = [haystack substringWithRange:[ntcr rangeAtIndex:captureIndex]];
//NSLog(@"Found '%@'", capture);
[arMutable addObject:capture];
}
}
ar = arMutable;
return ar;
}
Problem
I am accustomed to using parentheses to match capture groups in Perl in a manner like this:
#!/usr/bin/perl -w
use strict;
my $str = "This sentence has words in it.";
if(my ($what, $inner) = ($str =~ /This (\S+) has (\S+) in it/)) {
print "That $what had '$inner' in it.\n";
}
That code will produce:
That sentence had 'words' in it.
But in Objective C, with NSRegularExpression, we get different results. Sample function:
- (void)regexTest:(NSString *)haystack pattern:(NSString *)strPattern
{
NSError *error = NULL;
NSArray *arTextCheckingResults;
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:strPattern
options:NSRegularExpressionSearch
error:&error];
NSUInteger numberOfMatches = [regex numberOfMatchesInString:haystack options:0 range:NSMakeRange(0, [haystack length])];
NSLog(@"Pattern: '%@'", strPattern);
NSLog(@"Search text: '%@'", haystack);
NSLog(@"Number of matches: %lu", numberOfMatches);
arTextCheckingResults = [regex matchesInString:haystack options:0 range:NSMakeRange(0, [haystack length])];
for (NSTextCheckingResult *ntcr in arTextCheckingResults) {
NSString *match = [haystack substringWithRange:[ntcr rangeAtIndex:1]];
NSLog(@"Found string '%@'", match);
}
}
Calls to that test function, and the results show it is able to count the number of words in the string:
NSString *searchText = @"This sentence has words in it.";
[myClass regexTest:searchText pattern:@"\\b(\\S+)\\b"];
Pattern: '\b(\S+)\b'
Search text: 'This sentence has words in it.'
Number of matches: 6
Found string 'This'
Found string 'sentence'
Found string 'has'
Found string 'words'
Found string 'in'
Found string 'it'
But what if the capture groups are explicit, like so?
[myClass regexTest:searchText pattern:@".*This (sentence) has (words) in it.*"];
Result:
Pattern: '.*This (sentence) has (words) in it.*'
Search text: 'This sentence has words in it.'
Number of matches: 1
Found string 'sentence'
Same as above, but with \S+ instead of the actual words:
[myClass regexTest:searchText pattern:@".*This (\\S+) has (\\S+) in it.*"];
Result:
Pattern: '.*This (\S+) has (\S+) in it.*'
Search text: 'This sentence has words in it.'
Number of matches: 1
Found string 'sentence'
How about a wildcard in the middle?
[myClass regexTest:searchText pattern:@"^This (\\S+) .* (\\S+) in it.$"];
Result:
Pattern: '^This (\S+) .* (\S+) in it.$'
Search text: 'This sentence has words in it.'
Number of matches: 1
Found string 'sentence'
References:
NSRegularExpression
NSTextCheckingResult
NSRegularExpression matching options
I think if you change
to
You will get the expected result, for patterns containing a single capture.
See the doc page for NSTextCheckingResult:rangeAtIndex:
A result must have at least one range, but may optionally have more (for example, to represent regular expression capture groups).
Passing rangeAtIndex: the value 0 always returns the value of the the range property. Additional ranges, if any, will have indexes from 1 to numberOfRanges-1.