There are so many goodies that come with a modern Unix shell environment that the thing I need is almost always installed on my machine or a quick download away; the trouble is just finding it. In this case, I’m trying to find basic statistical operations.
For example, right now I’m prototyping a crawler-based app. Thanks to wget plus some other goodies, I now have a few hundred thousand files. So I can estimate the cost of doing this with billions of files, I’d like to get the mean and median of file sizes over a certain limit. E.g.:
% ls -l | perl -ne '@a=split(/\s+/); next if $a[4] <100; print $a[4], "\n"' > sizes
% median sizes
% mean sizes
Sure, I could code my own median and mean bits in a little bit of perl or awk. But isn’t there already some noob-friendly package that does this and a lot more besides?
Can you install R ? Then littler and its
rcommand can help:This is example we had used before, hence the R function
summary()which containsmedian()andmean()as well as an ascii-art alikestemplot. Generalization to just callingmedian()ormean()are of course pretty straightforward.