Edit 2019: This question was asked prior to changes in data.table in November 2016,

Question

0

Asked: June 8, 20262026-06-08T23:51:06+00:00 2026-06-08T23:51:06+00:00

Edit 2019: This question was asked prior to changes in data.table in November 2016,

0

Edit 2019: This question was asked prior to changes in data.table in November 2016, see the accepted answer below for both the current and previous methods.

I have a data.table table with about 2.5 million rows. There are two columns. I want to remove any rows that are duplicated in both columns. Previously for a data.frame I would have done this:
df -> unique(df[,c('V1', 'V2')]) but this doesn’t work with data.table. I have tried unique(df[,c(V1,V2), with=FALSE]) but it seems to still only operate on the key of the data.table and not the whole row.

Any suggestions?

Cheers,
Davy

Example

>dt
      V1   V2
[1,]  A    B
[2,]  A    C
[3,]  A    D
[4,]  A    B
[5,]  B    A
[6,]  C    D
[7,]  C    D
[8,]  E    F
[9,]  G    G
[10,] A    B

in the above data.table where V2 is the table key, only rows 4,7, and 10 would be removed.

dt <- data.table::data.table(
  V1 = c("B", "A", "A", "A", "A", "A", "C", "C", "E", "G"),
  V2 = c("A", "B", "B", "B", "C", "D", "D", "D", "F", "G"),
)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T23:51:08+00:00

For v1.9.8+ (released November 2016)

From ?unique.data.table
By default all columns are being used (which is consistent with ?unique.data.frame)

unique(dt)
   V1 V2
1:  A  B
2:  A  C
3:  A  D
4:  B  A
5:  C  D
6:  E  F
7:  G  G

Or using the by argument in order to get unique combinations of specific columns (like previously keys were used for)

unique(dt, by = "V2")
   V1 V2
1:  A  B
2:  A  C
3:  A  D
4:  B  A
5:  E  F
6:  G  G

Prior v1.9.8

From ?unique.data.table, it is clear that calling unique on a data table only works on the key. This means you have to reset the key to all columns before calling unique.

library(data.table)
dt <- data.table(
  V1=LETTERS[c(1,1,1,1,2,3,3,5,7,1)],
  V2=LETTERS[c(2,3,4,2,1,4,4,6,7,2)]
)

Calling unique with one column as key:

setkey(dt, "V2")
unique(dt)
     V1 V2
[1,]  B  A
[2,]  A  B
[3,]  A  C
[4,]  A  D
[5,]  E  F
[6,]  G  G

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Edit 2019: This question was asked prior to changes in data.table in November 2016,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply