So, my last question I wasn’t quite specific enough, and although I’m alot closer, I am still having problems with joining my 3 text tables in a way that makes sense. Now, in more detail here they are:
T1_01 = Table 1
No Object CCmax Vhel cont noise Mag1
001 _P10644 0.816 123.04 2450.3 74.2 15.34
002 Parked -99.900 -99.90 -99.9 -99.9 -99.90
003 _P10569 0.791 146.30 2650.7 75.3 15.50
004 _P10769 0.641 141.49 482.7 30.2 16.42
005 _P10572 0.848 138.15 2161.4 46.3 15.85
T1_02 = Table 2
Fibrel Namel Typel Pivl RAl DECl Magl
001 F1_P10644 P 1 4.89977691 -0.5104696 15.3
002 Parked N 2 4.88965087 -0.4904939 0.0
003 F1_P10569 P 3 4.89642427 -0.5099916 15.5
004 F1_P10769 P 4 4.90643599 -0.5112466 16.4
005 F1_P10572 P 5 4.89644907 -0.5105655 15.8
T1_03 = Table 3
Name RA DEC Imag Fieldname fiber RV eRV
F1_P10644 4.899776910023531 -0.510469633262908 15.34 100606F1red 001 122.47 2.94
F1_P10569 4.896424277974554 -0.509991655454702 15.50 100606F1red 003 145.55 2.72
F1_P10769 4.906435995618358 -0.511246644149622 16.42 100606F1red 004 116.28 12.87
F1_P10572 4.896449076194342 -0.510565529409031 15.85 100606F1red 005 136.15 3.01
The table output I am hoping for is:
T1_0123 (joined on column 1 T1_01, column 1 T1_02, and column 6 T1_03)
No Object CCmax Vhel cont noise Mag1 Fibrel Namel Typel Pivl RAl DECl Magl Name RA DEC Imag Fieldname fiber RV eRV
where line1 =
001 _P10644 0.816 123.04 2450.3 74.2 15.34 001 F1_P10644 P 1 4.89977691 -0.5104696 15.3 F1_P10644 4.899776910023531 -0.510469633262908 15.34 100606F1red 001 122.47 2.94
and line2 =
002 Parked -99.9 -99.9 -99.9 -99.9 -99.9 002 Parked N 2 4.88965087 -0.4904939 0.0 -99.9 -99.9 -99.9 -99.9 -99.9 -99.9 -99.9 -99.9
So that -99.9 was written into the line that had no match for the 3rd file.
Now I CAN join the files if I skip the header with:
join -1 1 -2 1 |awk 'NR != 1' <T1_02 |awk 'NR != 1'<T1_01 >T1_021
join -1 1 -2 6 T1_021 |awk 'NR != 1'<T1_03 >T1_0123
However this ONLY prints the results of the first table listed in the join, so I don’t get all columns I need. Likewise if I want all 3 tables I ‘could’ do:
paste T1_01 T1_02 T1_03
Except, in this case my T1_03 will not match as it is missing several values. So what I am looking for is a way to say something like:
for all i in files T1_01,T1_02,T1_03
if T1_01 $1 == T1_02 $2 == T1_03 $6
# then print T1_01[i] T1_02[i] T1_03[i] \n,
else
# print T1_01[i] T1_02[i] -99.9 (for all blanks)
fi
done
Or conversely, use my join statement above and print all lines in BOTH tables joined, or perhaps some sort of paste | join?? Not sure about that last idea as I haven’t found anything that really works yet.
Additionally I can do put the -99.9 in later with:
sed -i -e 's/ / 99.9 -99.9 -99.9 -99.9 -99.9 -99.9 -99.9 -99.9/' T1_0123
And I can manually add headers as well, so the main problem is getting the right paste result.
Hopefully I have phrased the question better this time, thanks everyone, for helping a new bash user!
This is doing what you want. The script assumes your data to be in data1, data2 and data3. It writes all this data into a temporary file while tagging it according to origin (lines from data1 are appended “A”, etc…). It also adds the index on which to join to the beginning of lines from data3. Then the data is sorted to group corresponding lines.
Then awk is used to print corresponding records and fill in placeholder data for missing entries from data3.
You should be able to adjust to your needs if that’s not exactly what you wanted – otherwise drop a comment 🙂