On a Linux server that I work with, a process writes randomly-named files at random intervals. Here’s a small sample, showing the file size, modification date & time, and file name:
27659 2009-03-09 17:24 APP14452.log 0 2009-03-09 17:24 vim14436.log 20 2009-03-09 17:24 jgU14406.log 15078 2009-03-10 08:06 ySh14450.log 20 2009-03-10 08:06 VhJ14404.log 9044 2009-03-10 15:14 EqQ14296.log 8877 2009-03-10 19:38 Ugp14294.log 8898 2009-03-11 18:21 yzJ14292.log 55629 2009-03-11 18:30 ZjX14448.log 20 2009-03-11 18:31 GwI14402.log 25955 2009-03-12 19:19 lRx14290.log 14989 2009-03-12 19:25 oFw14446.log 20 2009-03-12 19:28 clg14400.log
(Note that sometimes the file size can be zero.)
What I would like is a bash script to sum the size of the files, broken down by date, producing output something like this (assuming my arithmetic is correct):
27679 2009-03-09 33019 2009-03-10 64527 2009-03-11 40964 2009-03-12
The results would show activity trends over time, and highlight the exceptionally busy days.
In SQL, the operation would be a cinch:
SELECT SUM(filesize), filedate FROM files GROUP BY filedate;
Now, this is all probably pretty easy in Perl or Python, but I’d really prefer a bash shell or awk solution. It seems especially tricky to me to group the files by date in bash (especially if you can’t assume a particular date format). Summing the sizes could be done in a loop I suppose, but is there an easier, more elegant, approach?
I often use this idiom of Awk: