I’ve got a huge log file (almost 6 GB) of a game server filled with millions of errors (hundreds were caused every second at that time) besides useful records that need to be kept. I’d like to remove all the lines including an error while keeping the ones showing chat messages or other information.
However, I can’t just easily remove the lines I’d like to dump because the error messages aren’t always the same and always require a different amount of lines. In short, I simply can’t determine which lines include an error. I need a regular expression to do so. I’ve been looking for a program that fits my purposes. I haven’t found one yet, though. sed (stream editor) could do such a job for instance as it wouldn’t need too many resources to process such a huge file. However, it doesn’t support finding and replacing over multiple lines.
Therefore, is there a program that supports finding and replacing regular expressions in huge text files over multiple lines? Or is it recommended to write your own script to do that job?
The log file looks as follows:
2011-03-02 01:43:00 [INFO] <admin> CraftBook is causing errors.
2011-03-02 01:43:01 [SEVERE] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms
java.lang.NoSuchMethodError: com.sk89q.worldedit.blocks.BlockType.isRedstoneBlock(I)Z
at com.sk89q.craftbook.bukkit.MechanicListenerAdapter$MechanicBlockListener.onBlockRedstoneChange(MechanicListenerAdapter.java:174)
at net.minecraft.server.BlockButton.a(BlockButton.java:170)
at net.minecraft.server.ItemInWorldManager.a(ItemInWorldManager.java:160)
at net.minecraft.server.NetServerHandler.a(NetServerHandler.java:482)
at net.minecraft.server.Packet15Place.a(SourceFile:57)
at net.minecraft.server.NetworkManager.a(SourceFile:230)
at net.minecraft.server.NetServerHandler.a(NetServerHandler.java:75)
at net.minecraft.server.NetworkListenThread.a(SourceFile:100)
at net.minecraft.server.MinecraftServer.h(MinecraftServer.java:357)
at net.minecraft.server.MinecraftServer.run(MinecraftServer.java:272)
at net.minecraft.server.ThreadServerApplication.run(SourceFile:366)
2011-03-02 01:43:01 [INFO] <admin> Is it working yet?
2011-03-02 01:43:01 [INFO] <admin> Not really.
2011-03-02 01:43:01 [SEVERE] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms
java.lang.NoSuchMethodError: com.sk89q.worldedit.blocks.BlockType.isRedstoneBlock(I)Z
at com.sk89q.craftbook.bukkit.MechanicListenerAdapter$MechanicBlockListener.onBlockRedstoneChange(MechanicListenerAdapter.java:174)
at net.minecraft.server.MinecraftServer.h(MinecraftServer.java:348)
at net.minecraft.server.MinecraftServer.run(MinecraftServer.java:272)
at net.minecraft.server.ThreadServerApplication.run(SourceFile:366)
2011-03-02 01:43:02 [INFO] <admin> I hope we find a solution as soon as ever possible.
The desired result would be the following:
2011-03-02 01:43:00 [INFO] <admin> CraftBook is causing errors.
2011-03-02 01:43:01 [INFO] <admin> Is it working yet?
2011-03-02 01:43:01 [INFO] <admin> Not really.
2011-03-02 01:43:02 [INFO] <admin> I hope we find a solution as soon as ever possible.
As you can see, the log file contains the same error over and over again. Even though it always starts with the date and time followed by [SEVERE] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms and ends with at net.minecraft.server.ThreadServerApplication.run(SourceFile:366), the error message in between is different each time. That’s why I can’t just replace the error message by an empty string.
Is there a regular expression that could both help me get rid of all the lines containing an error but keep the remaining lines? That way, my log file would shrink to under 50 MB in size as it used to be before all these errors were caused by my server due to a broken plugin.
This Python script makes one pass through a logfile read from stdin, printing the filtered log messages to stdout.
It uses a regular expression to match lines that mark the beginning of a log message (such as a line that starts with
2011-03-02 01:43:00 [).If a line that begins a log message contains
[SEVERE] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms, the script discards all lines between that line and the line containing the start of the next log message. Otherwise, it outputs the line. You can think of this as a finite state machine with two states, which correspond to whether the script is skipping over lines or outputting lines.I added some special cases to the log file for testing. Here’s the log file that I tested it with:
And here’s the output: