I am using Fedora, and bash to do some text manipulation with the files I have. I am trying to combine a large number of files, each one with two columns of data. From these files, I want to extract the data on the 2nd column of the files, and put it in a single file. Previously, I used the following script:
paste 0_0.dat 0_6.dat 0_12.dat | awk '{print $1, $2, $4}' >0.dat
But this is painfully hard as the number of files gets larger — trying to do with 100 files. So I looked through the web to see if there’s a way to achieve this in a simple way, but come up empty-handed.
I’d like to invoke a ‘for’ loop, if possible — for example,
for i in $(seq 0 6 600)
do
paste 0_0.dat | awk '{print $2}'>>0.dat
done
but this does not work, of course, with paste command.
Please let me know if you have any recommendations on how to do what I’m trying to do …
DATA FILE #1 looks like below (deliminated by a space)
-180 0.00025432
-179 0.000309643
-178 0.000189226
.
.
.
-1 2E-5
0 1.4E-6
1 0.00000
.
.
.
178 0.0023454268
179 0.002352534
180 0.001504992
DATA FILE #2
-180 0.0002352
-179 0.000423452
-178 0.00019304
.
.
.
-1 2E-5
0 1.4E-6
1 0.00000
.
.
.
178 0.0023454268
179 0.002352534
180 0.001504992
First column goes from -180 to 180, with increment of 1.
DESIRED
(n is the # of columns; and # of files)
-180 0.00025432 0.00025123 0.000235123 0.00023452 0.00023415 ... n
-179 0.000223432 0.0420504 0.2143450 0.002345123 0.00125235 ... n
.
.
.
-1 2E-5
0 1.4E-6
1 0.00000
.
.
.
179 0.002352534 ... n
180 0.001504992 ... n
Thanks,
How about this:
This assumes that you don’t run into a limit with
paste(check how many open files it can have). The"$@"notation means ‘all the arguments given, exactly as given’. Theawkscript simply prints$1from each line of pasted output, followed by the even-numbered columns; followed by a newline. It doesn’t validate that the odd-numbered columns all match; it would perhaps be sensible to do so, and you could code a vaguely similar loop to do so inawk. It also doesn’t check that the number of fields on this line is the same as the number on the previous line; that’s another reasonable check. But this does do the whole job in one pass over all the files – for an essentially arbitrary list of files.You put my original answer in a script ‘filter-data’; you invoke the script with the 101 file names generated by
seq. Thepastecommand pastes all 101 files together; theawkcommand selects the columns you are interested in.The
seqcommand with the format will list you 101 file names; these are the 101 files that will be pasted.You could even do without the
filter-datascript:I’d probably go with the more general script as the main script, and if need be I’d create a ‘one-liner’ that invokes the main script with the specific set of arguments currently of interest.
The other key point which might be a stumbling block:
pasteis not limited to 2 files only; it can paste as many files as you can have open (give or take about 3).