I have a following simple script for parsing out dates from irc logs (created by irssi)
#!/bin/bash
query=$1
grep -n $query logfile > matches.log
grep -n "Day changed" logfile >> matches.log
cat matches.log | sort -n
It produces output like:
--- Day changed Tue Jul 03 2012
--- Day changed Wed Jul 04 2012
--- Day changed Thu Jul 05 2012
16:54 <@Hamatti> who let the dogs out
--- Day changed Fri Jul 06 2012
--- Day changed Sat Jul 07 2012
--- Day changed Sun Jul 08 2012
12:11 <@Hamatti> dogs are fun
But since I’m only interested in finding out dates for actual matches, I’d like to filter out all those
--- Day changed XXX XXX dd dddd
lines where they don’t follow by timestamp on the next line. So the example should output
--- Day changed Thu Jul 05 2012
16:54 <@Hamatti> who let the dogs out
--- Day changed Sun Jul 08 2012
12:11 <@Hamatti> dogs are fun
to get rid of all the disinformation that’s not useful.
edit.
After the answer by T. Zelieke I realised that I could make this more of a one-liner so I use the following now to save logfile from being iterated twice.
query=$1
egrep "$query|Day changed" logfile |grep -B1 "^[^-]" |sed '/^--$/d'
This uses
grepto filter lines that do NOT start with a dash ("^[^-]").-B1asks to print the immediate line before a match.Unfortunately
grepseparates then each match (pair of two lines) by an--line. Therefore I pipe the output throughsedto get rid of those superflouos lines.