I have been trying to get the unique values in each column of a tab delimited file in bash. So, I used the following command.
cut -f <column_number> <filename> | sort | uniq -c
It works fine and I can get the unique values in a column and its count like
105 Linux
55 MacOS
500 Windows
What I want to do is instead of sorting by the column value names (which in this example are OS names) I want to sort them by count and possibly have the count in the second column in this output format. So It will have to look like:
Windows 500
MacOS 105
Linux 55
How do I do this?
You can use (where
Nis the column number andFis the input file):The initial
sort/uniqis to get each OS in the form<count> <os>so that the rest of the pipeline can work on it.The
sort -nrk1,1sorts numerically (n), in reverse order (r), using the first field (-k1,1).The
awkthen simply reverses the order of the columns. You can test the full pipeline with the following:This test file format is similar in style to your own input, including tabs separating the fields. It’s unlikely to be the exact same format so you’ll need to tailor the
cutcommand to your own file, in such a way that it only gives you the desired column.However, you’ve probably already done that since that’s not the bit you’re asking about.