I am looking for a more specific version of the :sort u command that would allow removing all duplicate lines from a file. I am working with a CSV file, and would like to remove all the lines that have duplicates in their second-column entry. In other words, two lines are declared to be duplicates if they have the same value in the second column.
For example, for the following file:
a,1,b
g,1,f
c,1,x
i,2,l
m,1,k
o,2,p
u,1,z
the command in question should yield:
a,1,b
i,2,l
The choice of the specific rows to be kept are not important, as long as the second column entries are all unique.
What Vim command will produce the output above?
Thanks!
Since it is not possible to achieve the transformation in question in
one run of the
:sortcommand, let us approach it as a two-step process.1. The first step is sorting lines by the values of the second column
(separated from the first one by a comma). In order to do that, we can
use the
:sortcommand, passing a regular expression that matches thefirst column and the following comma:
As
:sortcompares the text starting just after the match of thespecified pattern on each line, it gives us the desired sorting
behavior. To compare the values numerically rather than
lexicographically, use the
nflag:2. The second step involves running through the sorted lines and removing
all lines but one in every block of consecutive lines with the same
value in the second column. It is convenient to build our implementation
upon the
:globalcommand, which executes a given Ex command on everyline matching a certain pattern. For our purposes, a line can be
deleted if it contains the same value in the second column as the
following line. This formalization—accompanied with the initial
assumption that commas cannot occur within column values—gives us
the following pattern:
If we run the
:deletecommand on every line that satisfies thispattern, going from top to bottom over them in sorted order, we will
have only a single line for every distinct value in the second column:
3. Finally, both of the steps can be combined in a single Ex command: