I have a few very large log files, and I need to parse them. Ease of implementation obviously points me to Perl and regex combo (in which I am a still novice). But what about speed? Will it be faster to implement it in C? Each log file is in the order of 2 GB.
Share
I very much doubt C will be faster than Perl unless you were to hand-compile the RE.
By hand-compiling, I mean coding the finite state machine (FSM) directly rather than using the RE engine to compile it. This approach means you can optimize it for your specific case which can often be faster than relying on the more general-purpose engine.
But that’s not something I’d ever suggest to anyone who hasn’t had to write compilers or parsers before without the benefit of lex, yacc, bison or other similar tools.
The generalized engines, such as PCRE, are usually powerful and fast enough (for my needs anyway, and those needs have often been very demanding).
When using a general RE engine, it needs to be able to handle all sorts of cases whether it’s written in C or Perl. When you think about which is faster, you only have to compare what the RE engines are written in for both cases (hint: the Perl RE engine is not written in Perl).
They’re both written in C so you should find very little difference in terms of the matching speed.
You may find differences in the support code around the REs but that will be minimal, especially if it’s a simple read/match/output loop.