I’d like to join two files in bash using a common column. I want

Question

0

Asked: May 28, 20262026-05-28T02:37:58+00:00 2026-05-28T02:37:58+00:00

I’d like to join two files in bash using a common column. I want

0

I’d like to join two files in bash using a common column. I want to retain both all pairable and unpairable lines from both files. Unfortunately using join I could save unpairable fields from only one file, eg. join -1 1 -2 2 -a1 -t" ".
I’d also want to retain all pairings for repeated entries (in join column) from both files.
I.e. If file1 is
x id1 a b
x id1 c d
x id1 d f
x id2 c x
x id3 f v

and second file is

id1 df cf
id1 ds dg
id2 cv df
id2 as ds
id3 cf cg

the resulting file should be:

x id1 a b df cf
x id1 a b ds dg
x id1 c d df cf
x id1 c d ds dg
x id1 d f df cf
x id1 d f ds dg
x id2 c x cv df
x id2 c x as ds
x id3 f v cf cg

That’s why I’ve always using SAS to make such join, after sorting appropriate columns.

data x;
merge file1 file2;
by common_column;
run;

It works fine but
1. as I use Ubuntu for most time I have to switch to Windows to merge data in SAS.
2. most importantly, SAS can truncate too long data entries.

That’s why I’d prefer to join my files in bash, but I don’t know appropriate command.
Could someone help me, or direct me to appropriate resource?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T02:37:59+00:00

According to join‘s man page, -a <filenum> retains all unpairable lines from file <filenum> (1 or 2). So, just add -a1 -a2 to your command line and you should be done. For example:

# cat a
1 blah
2 foo

# cat b
2 bar
3 baz

# join -1 1 -2 1 -t" " a b
2 foo bar

# join -1 1 -2 1 -t" " -a1 a b
1 blah
2 foo bar

# join -1 1 -2 1 -t" " -a2 a b
2 foo bar
3 baz

# join -1 1 -2 1 -t" " -a1 -a2 a b
1 blah
2 foo bar
3 baz

Is this what you were looking for?

Edit:

Since you provided more detail, here is how to produce your desired output (note that my file a is your first file and my file b your second file. I had to reverse -1 1 -2 2 to -1 2 -2 1 to join on the id). I added a field list to format the output as well – note that ‘0’ is the join field in it:

# join -1 2 -2 1 -o 1.1,0,1.3,1.4,2.2,2.3 a b

produces what you’ve given. Add -a1 -a2 to retain unpairable lines from both files you then get two more lines (you can guess my test data from them):

x id4 u t
 id5   ui oi

Which is rather unreadable since any left out field is just a space. So let’s replace them with a ‘-‘, leading to:

# join -1 2 -2 1 -a1 -a2 -e- -o 1.1,0,1.3,1.4,2.2,2.3 a b
x id1 a b df cf
x id1 a b ds dg
x id1 c d df cf
x id1 c d ds dg
x id1 d f df cf
x id1 d f ds dg
x id2 c x cv df
x id2 c x as ds
x id3 f v cf cg
x id4 u t - -
- id5 - - ui oi

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’d like to join two files in bash using a common column. I want

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply