I’m migrating from data frames and matrices to data tables, but haven’t found a

Question

0

Asked: May 25, 20262026-05-25T23:07:12+00:00 2026-05-25T23:07:12+00:00

I’m migrating from data frames and matrices to data tables, but haven’t found a

0

I’m migrating from data frames and matrices to data tables, but haven’t found a solution for extracting the unique rows from a data table. I presume there’s something I’m missing about the [,J] notation, though I’ve not yet found an answer in the FAQ and intro vignettes. How can I extract the unique rows, without converting back to data frames?

Here is an example:

library(data.table)
set.seed(123)
a <- matrix(sample(2, 120, replace = TRUE), ncol = 3)
a <- as.data.frame(a)
b <- as.data.table(a)

# Confirm dimensionality
dim(a) # 40  3
dim(b) # 40  3

# Unique rows using all columns
dim(unique(a))  # 8 3
dim(unique(b))  # 34 3

# Unique rows using only a subset of columns
dim(unique(a[,c("V1","V2")]))   # 4 2
dim(unique(b[,list(V1,V2)]))    # 29 2

Related question: Is this behavior a result of the data being unsorted, as with the Unix uniq function?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T23:07:13+00:00

Before data.table v1.9.8, the default behavior of unique.data.table method was to use the keys in order to determine the columns by which the unique combinations should be returned. If the key was NULL (the default), one would get the original data set back (as in OPs situation).

As of data.table 1.9.8+, unique.data.table method uses all columns by default which is consistent with the unique.data.frame in base R. To have it use the key columns, explicitly pass by = key(DT) into unique (replacing DT in the call to key with the name of the data.table).

Hence, old behavior would be something like

library(data.table) v1.9.7-
set.seed(123)
a <- as.data.frame(matrix(sample(2, 120, replace = TRUE), ncol = 3))
b <- data.table(a, key = names(a))
## key(b)
## [1] "V1" "V2" "V3"
dim(unique(b)) 
## [1] 8 3

While for data.table v1.9.8+, just

b <- data.table(a) 
dim(unique(b)) 
## [1] 8 3
## or dim(unique(b, by = key(b)) # in case you have keys you want to use them

Or without a copy

setDT(a)
dim(unique(a))
## [1] 8 3

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m migrating from data frames and matrices to data tables, but haven’t found a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply