I have a script in python to process a log file – it parses

Question

0

Asked: June 12, 20262026-06-12T01:55:41+00:00 2026-06-12T01:55:41+00:00

I have a script in python to process a log file – it parses

0

I have a script in python to process a log file – it parses the values and joins them simply with a tab.

p = re.compile(
    "([0-9/]+) ([0-9]+):([0-9]+):([0-9]+) I.*"+
    "worker\\(([0-9]+)\\)(?:@([^]]*))?.*\\[([0-9]+)\\] "+
    "=RES= PS:([0-9]+) DW:([0-9]+) RT:([0-9]+) PRT:([0-9]+) IP:([^ ]*) "+
    "JOB:([^!]+)!([0-9]+) CS:([\\.0-9]+) CONV:([^ ]*) URL:[^ ]+ KEY:([^/]+)([^ ]*)"
  )

for line in sys.stdin:
  line = line.strip()
  if len(line) == 0: continue
  result = p.match(line)
      if result != None:
    print "\t".join([x if x is not None else "." for x in result.groups()])

However, the scripts behaves quite slowly and it takes a long time to process the data.

How can I achieve the same behaviour in faster way? Perl/SED/PHP/Bash/…?

Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T01:55:42+00:00

Im writing Perl, not Python, but recently i used this technique to parse very big logs:

Divide input file to chunks (for example, FileLen/NumProcessors bytes
each).
Adjust start and end of every chunk to \n so you take full lines to
each worker.
fork() to create NumProcessors workers, each of which reading own
bytes range from file and writes his own output file.
Merge output files if needed.

Sure, you should work to optimize the regexp too, for example less use .* cus it will create many backtraces, this is slow. But anyway, 99% you will have bottleneck on CPU by this regexp, so working on 8 CPUs should help.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a script in python to process a log file – it parses

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply