When I review logs on a server I want to have a quick overview of the type of problems that I’m facing before I got digging deeper. I usually use the below one-liner which reduces the number of log lines I need to review by making each line more generic and keeping only the ones that are unique.
One-liner
cat /var/log/apache2/error.log | sed 's/.*] \(.*\)/\1/' | sed 's/[0-9]*//g' | sort | uniq
Explanation
| sed 's/.*] \(.*\)/\1/': removes everything contained within and before the last bracket group which typically corresponds to client/system-specific info such as [Mon Dec 05 12:01:03 2011] [error] [client a.b.c.d]
| sed 's/[0-9]*//g': removes numbers
| sort | uniq: only keeps lines that are different.
To give you an idea, on an /var/log/apache2/error.log that contains around 500 lines, this filters out to 25 lines. Off course the more generic/similar the log entries, the more the one-liner is effective.
What I’m looking for now is a script (could be bash,perl,python or anything else actually) that could do the same thing but a bit more advanced so as to be even more effective (e.g. obfuscate file paths, alphabetical ids…) and useful (count how many occurrences of each message, % compared to the overall amount of logs…).
Do you know scripts that do that?
have a look at logwatch . it does send you a daily overview of common logfiles. I have only used it for mail logs so far but afaik it can handle apache logs as well.