Suppose I have a vector and I don’t know, apriori, its unique elements (here:

Question

0

Asked: June 11, 20262026-06-11T14:49:22+00:00 2026-06-11T14:49:22+00:00

Suppose I have a vector and I don’t know, apriori, its unique elements (here:

0

Suppose I have a vector and I don’t know, apriori, its unique elements (here: 1 and 2).

vec <-
  c(1, 1, 1, 2, 2, 2, 2)

I was interested in knowing is there a better way (or elegant way) of getting the number of unique elements in vec i.e. the same result as table(vec). It doesn’t matter if its a data.frame or a named vector.

R> table(vec)
vec
1 2 
3 4

Reason: I was curious to know if there is a better way. Also, I noticed that there is a for loop in the base implementation (in addition to .C call). I don’t know if it’s a big concern, but when I do something like

R> table(rep(1:1000,100000))

R takes really long time. I am sure it’s because of the huge number 100000. But is there a way of making it faster?

EDIT This also does a good job in addition to Chase's answer.

R> rle(sort(sampData))

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T14:49:23+00:00

This is an interesting problem – I’m curious to see other thoughts on this. Looking at the source for table() reveals that it builds off of tabulate(). tabulate() has a few quirks apparently, namely that it only deals with positive integers and returns an integer vector without names. We can use unique() on our vector to apply the names(). If you need to tabulate zero or negative values, I guess going back and reviewing table() would be necessary as tabulate() doesn’t seem to do that per the examples on the help page.

table2 <- function(data) {
    x <- tabulate(data)
    y <- sort(unique(data))
    names(x) <- y
    return(x)   
    }

And a quick test:

> set.seed(42)
> sampData <- sample(1:5, 10000000, TRUE, prob = c(.3,.25, .2, .15, .1))
> 
> system.time(table(sampData))
   user  system elapsed 
  4.869   0.669   5.503 
> system.time(table2(sampData))
 user  system elapsed 
0.410   0.200   0.605
> 
> table(sampData)
sampData
      1       2       3       4       5 
2999200 2500232 1998652 1500396 1001520 
> table2(sampData)
      1       2       3       4       5 
2999200 2500232 1998652 1500396 1001520

EDIT: I just realized there is a count() function in plyr which is another alternative to table(). In the test above, it performs better than table(), and slightly worse than the hack-job solution I put together:

library(plyr)
 system.time(count(sampData))
   user  system elapsed 
  1.620   0.870   2.483

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Suppose I have a vector and I don’t know, apriori, its unique elements (here:

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply