I need to sort a matrix so that all elements stay in their columns and each column is in ascending order. Is there a vectorized column-wise sort for a matrix or a data frame in R? (My matrix is all-positive and bounded by B, so I can add j*B to each cell in column j and do a regular one-dimensional sort:
> set.seed(100523); m <- matrix(round(runif(30),2), nrow=6); m
[,1] [,2] [,3] [,4] [,5]
[1,] 0.47 0.32 0.29 0.54 0.38
[2,] 0.38 0.91 0.76 0.43 0.92
[3,] 0.71 0.32 0.48 0.16 0.85
[4,] 0.88 0.83 0.61 0.95 0.72
[5,] 0.16 0.57 0.70 0.82 0.05
[6,] 0.77 0.03 0.75 0.26 0.05
> offset <- rep(seq_len(5), rep(6, 5)); offset
[1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5
> m <- matrix(sort(m + offset), nrow=nrow(m)) - offset; m
[,1] [,2] [,3] [,4] [,5]
[1,] 0.16 0.03 0.29 0.16 0.05
[2,] 0.38 0.32 0.48 0.26 0.05
[3,] 0.47 0.32 0.61 0.43 0.38
[4,] 0.71 0.57 0.70 0.54 0.72
[5,] 0.77 0.83 0.75 0.82 0.85
[6,] 0.88 0.91 0.76 0.95 0.92
But is there something more beautiful already included?) Otherwise, what would be the fastest way if my matrix has around 1M (10M, 100M) entries (roughly a square matrix)? I’m worried about the performance penalty of apply and friends.
Actually, I don’t need “sort”, just “top n”, with n being around 30 or 100, say. I am thinking about using apply and the partial parameter of sort, but I wonder if this is cheaper than just doing a vectorized sort. So, before doing benchmarks on my own, I’d like to ask for advice by experienced users.
If you want to use sort,
?sortindicates thatmethod = "quick"can be twice as fast as the default method with on the order of 1 million elements.Start with
apply(m, 2, sort, method = "quick")and see if that provides sufficient speed.Do note the comments on this in
?sortthough; ties are sorted in a non-stable manner.