I’m trying to combine two lists, joining them by a common field suchs as ENST00000371026. I’ve tried the following but no luck. What is the actual way to do it?
cat> gar1.txt <<EOF
ENST00000371026 ENSG00000152763
ENST00000371023 ENSG00000152763
ENST00000395250 ENSG00000152763
ENST00000309502 ENSG00000163485
ENST00000377464 ENSG00000142599
ENST00000400908 ENSG00000142599
ENST00000337907 ENSG00000142599
ENST00000400907 ENSG00000142599
ENST00000401087 ENSG00000179571
EOF
cat> gar2.txt <<EOF
DDX11L1 ENST00000371026
DDX11L9 ENST00000309502
DDX11L1 ENST00000371026
OR4F5 ENST00000377464
DQ597235 n/a
DQ599768 n/a
LOC388312 ENST00000401087
LOC100132287 ENST00000425496
LOC100132287 ENST00000425496
EOF
join -t"\t" -2 2 -1 1 gar1.txt gar2.txt
An example output would be:
DDX11L1 ENSG00000152763
DDX11L9 ENSG00000163485
OR4F5 ENSG00000142599
LOC388312 ENSG00000179571
LOC100132287 NONE
Give this a try:
It uses Bash process substitution (
<()), but you could simply presort your files if you don’t want to use that. Your version ofjoinneeds to have the-ooption or you could use awk to process the output further. For the “n/a” lines it will print “NONE”, but you could do this to eliminate them:Edit:
This is the output I get without the
sedanduniq:And this is what I get with them:
Adding the
grep:which matches your example output except for the order. If you need to preserve the order it would be possible to do.