My Tomcat logs are build in this format:
[<DATE>] [<COMPONENT>] ERROR_TYPE <ERROR_NAME> - <Rest of line>
Where ERROR_TYPE is a log4j value like DEBUG or ERROR.
e.g.,
[18/Jul/2012:08:53:39 +0000] [component1] ERROR ConnectionTimeOut - ...
[18/Jul/2012:09:54:32 +0000] [component2] DEBUG IPNotFound - ...
[18/Jul/2012:09:54:32 +0000] [component1] TRACE Connected - ...
[18/Jul/2012:08:53:39 +0000] [component1] ERROR ConnectionTimeOut - ...
I would like to create a maps from the tuple (ERROR_TYPE, ERROR_NAME) to the number of occurrences, e.g.
ERROR ConnectionTimeOut 2
DEBUG IPNotFound 1
TRACE Connected 1
How do I match something like:
_anything_ (ERROR|DEBUG|TRACE|WARN|FATAL_spaces_ _another_word_)_anything_
in AWK, and return only the part in parentheses?
Lines are selected which contain the error types. A
countarray element is incremented for the type and name taken together as the index. The comma represents the contents of theSUBSEPvariable which defaults to\034. In theENDblock, iterate over thecountarray, splitting the indices using theSUBSEPvariable. Print the type, name and count.Edit:
This uses a regex to handle unstructured log entries: