My iphone app uses regular expressions (with NSRegularExpression) to perform calculations over a very large number of strings (in the 1000s). This of course takes a lot of time. What are some strategies to speed up the regular expressions? I looked into using blocks, but I don’t think it will do any good — they seem to mostly represent lambda functionality (i.e., equivalent to lisp) and are used on the Mac with multiple cores. Obviously, the current iPhone doesn’t have multiple cores.
Here’s my code:
NSString *replaceRegexPattern = @"([\\(|\\[].*?[\\)|\\]])|(^to )";
NSRegularExpression *replaceRegex = [[NSRegularExpression regularExpressionWithPattern:replaceRegexPattern
options:NSRegularExpressionCaseInsensitive
error:nil] retain];
NSArray *myArray = <some data>;
NSString *myString, *compareValue;
for (i = 0; i < [myArray count]; i++) {
myString = [myArray objectAtIndex:i];
compareValue = [replaceRegex stringByReplacingMatchesInString:myString
options:0
range:NSMakeRange(0, [myString length])
withTemplate:@""];
// do things with compareValue
}
To answer the question below, my goal in this code is to remove any text in my string which either is enclosed in parentheses, or which begins with “to “. Here are some examples:
- Hello (Goodbye) –> Hello
- Hello (Goodbye [n]) –> Hello
- To Say –> Say
- To Say (pf) –> Say
Since I don’t know what exactly you’re trying to do, it’s hard to give well-founded advice, but it looks like your regex could be improved a little.
Are you really trying to match strings like
(foo),[bar], and|baz|? You don’t need the|alternator inside character classes, so unless you want to match the third example here, drop the|s.Then, since you’re expecting strings like
(foo [bar] baz), you need to separate the two kinds of parentheses, and you can also speed up your regex a bit:This checks for
toat the start of the string first, then goes looking for an opening paren/bracket, anything except closing parens/brackets, and a closing paren/bracket. This needs less backtracking so it’s probably a bit faster.You won’t be able to handle nested parentheses/brackets of the same kind (
(foo (bar) baz)) with a single regex because that’s not regular anymore – unless you run the regex replace operation several times, once for each level of nesting. So the above example will be removed if you run the regex replace twice.