This question might seem vague, sorry. Does anybody have experience writing RegEx with Objective-C and Python? I am wondering about the performance of one vs the other? Which is faster in terms of 1. runtime speed, and 2. memory consumption? I have a Mac OS application that is always running in the background, and I’d like my app to index some text files that are being saved, and then save the result… I could write a regex method in my app in Obj-C, or I could potentially write a separate app using Perl or Python (just a beginner in Python).
(Thanks, I got some good info from some of you already. Boo to those who downvoted; I am here to learn, and I might have some stupid questions time to time – part of the deal.)
If you’re looking for raw speed, neither of those two would be a very good choice. For execution speed, you’d choose Perl. For how quickly you could code it up, either Python or Perl alike would easily beat the time to write it in Objective C, just as both would easily beat a Java solution. High-level languages that take less time to code up are always a win if all you’re measuring is time-to-solution compared with solutions that take many more lines of code.
As far as actual run-time performance goes, Perl’s regexes are written in very tightly coded C, and are known to be the fastest and most flexible regexes available. The regex optimizer does a lot of very clever things to the compiled regex program, such as applying an Aho–Corasick start-point optimization for finding the start of an alternation trie, running in O(1) time. Nobody else does that. Heck, I don’t think anybody else but Perl even bothers to optimize alternations into tries, which is the thing that takes you from O(n) to O(1), because the compiler spent more time doing something smart so that the interpreter runs much faster. Perl regexes also offer substantial improvements in debugging and profiling. They’re also more flexible than Python’s, but the debugging alone is enough to tip the balance.
The only exception on performance matters is with certain pathological patterns that degenerate when run under any recursive backtracker, whether Perl’s, Java’s, or Python’s. Those can be addressed by using the highly recommended
RE2library, written by Russ Cox, as a replacement plugin. I know it’s available as a transparent replacement regex engine for Perl, and I’m pretty sure I remember seeing that it was also available for Python, too.On the other hand, if you really want to use Python but just want a more expressive and robust regex library, particularly one that is well-behaved on Unicode, then you want to use Matthew Barnett’s
regexmodule, available for both Python2 and Python3. Besides conforming to tr18’s level-1 compliance requirements (that’s the standards doc on Unicode regexes), it also has all kinds of other clever features, some of which are completely sui generis. If you’re a regex connoisseur, it’s very much worth checking out.