My company’s proprietary software generates a log file that is much easier to use if it is parsed. The log parser we all use was written by another employee as a side project, and it has horrible performance.
These log files can grow to 10s of megabytes very quickly, and the parser we currently use has issues if a log file is bigger than 1 megabyte.
So, I want to write a program that can parse this massive amount of text in the shortest amount of time possible. We use Windows exclusively, so running on Windows is a must. Our current implementation runs on a local web server, and I’m convinced that running it as an application would have to be faster.
All suggestions will be helpful. Thanks.
EDIT: My ultimate goal is to parse the text and display it in a much more user friendly manner with colors and such. Can you do this with Perl and Python? I know you can do this with Java and C++. So, it will function like Notepad where you open a log file, but on the screen you display the user-friendly format instead of the raw file.
EDIT: So, I cant choose the best answer, and that was to choose a language that can best display what I’m going for, and then write the parser in that. Also, using ANTLR will probably make this process much easier. I changed the original question, since I guess I didn’t ask what I was really looking for. Thanks everyone!
Hmmm, “go with what you know” was a good answer. Perl was designed for this sort of thing (but imo is well suited for simple parsing, but I’d personally avoid it for complex projects).
If it gets even a little complex, why not use a proper syntax and grammar set-up?
Lex & Yacc (or Flex & Bison) spring to mind, but personally I would always reach for Antlr
Define various “words” in terms of patterns (syntax), and rules to combine those words (grammar) and Antlr will spit out a program to parse your input (you can have the program in Java, C, C++ and more (you are worried about parse time, so choose a compiled language, of course)).
I personally find it tedious to hand-craft parsers, and even more tedious to debug them, but AntlrWorks is a lovely IDE which really makes it a piece of cake …
That bit at the bottom is defining a grammar rule.
If you mess up your grammar rules, you will be informed. This is not the case with hand-crafted parsers, where you just scratch your
body partand wonder about the “strange results”…Check it out. Even if you think your project is trivial now, it may well grow. And if you have any interest in parsing you do owe it to yourself to at least be familiar with lex/yacc, but especially Antlr(Works)