I sometimes need to compare two text files. Obviously, diff shows the differences, it also hides the similarities, which is kind of the point.
Suppose I want to do other comparisons on these files: set union, intersection, and subtraction, treating each line as an element in the set.
Are there similarly simple common utilities or one-liners which can do this?
Examples:
a.txt
john
mary
b.txt
adam
john
$> set_union a.txt b.txt
john
mary
adam
$> set_intersection a.txt b.txt
john
$> set_difference a.txt b.txt
mary
Union:
sort -ufiles…Intersection:
sortfiles…| uniq -dOverall difference (elements which are just in one of the files):
sortfiles…| uniq -uMathematical difference (elements only once in one of the files):
sortfiles…| uinq -u | sort - <(sort -ufileX) | uniq -dThe first two commands get me all unique elements. Then we merge this with the file we’re interested in. Command breakdown for
sort - <(sort -ufileX):The
-will process stdin (i.e. the list of all unique elements).<(...)runs a command, writes the output in a temporary file and passes the path to the file to the command.So this gives is a mix of all unique elements plus all unique elements in fileX. The duplicates are then the unique elements which are only in fileX.