So I have about 1000 files that are multiple columns, but I’m only interested in some stats of two of those columns. If $4 was something like a star’s spectral class (ie a unique value) and $5 in each of these files was a result, like seen, unseen, unknown, etc, is there a recommended way to grep or awk out the stats like so across the 1000 or so files so I get something like:
Type O, #verified, #not-verified, #property-j, ...
Type B, ...
Type A, ...
.
.
.
Type i,
Where, in each file, you’d see something like:
$1, $2, $3, Spectral Type, Result
foo, foo, foo, A, verified
foo, foo, foo, G, verified
foo, foo, foo, A, unknown
foo, foo, foo, F, verified
foo, foo, foo, G, verified
foo, foo, foo, K, verified
foo, foo, foo, K, seen
If your question is: “How do I generate output of the form “Type $4, $5″ where $4 and $5 are the 4th and 5th columns of the input, respectively?” one solution is:
This gives the output that it seems you want, but relies on the all columns not containing whitespace. If there may be whitespace, you can do:
but you may want to trim the extra whitespace that this will generate. Please note that although in the example I have hardcoded the list of input files to be the 4 files names “list”, “of”, “input”, and “file”, I do not expect you to type the names in. Instead, you should generate them in some fashion, and I’m merely demonstrated one (of many!) methods of iterating over a set of files. It seems that the heart of this question is the portion dealing with
awk, and not the iteration.A second reading of the question indicates that you have exactly one row per input file and you want to summarize the results in a single file. In that case, just do: