I have a long vector x, and another v, which contains lengths. I would like to sum x so that the answer y is a vector of length length(v), and y[1] is sum(x[1:v[i]]), y[2] is sum(x[(1+v[1]):(v[1]+v[2])]), and so on. Essentially this is performing sparse matrix multiplication from a space of dimension length(x) to one of dimension length(v). However, I would prefer not to bring in “advanced machinery”, although I might have to. It does need to be very, very fast. Can anyone think of anything simpler than using a sparse matrix package?
Example –
x <- c(1,1,3,4,5)
v <- c(2,3)
y <- myFunc(x,v)
y should be c(2,12)
I am open to any pre-processing – e.g, storing in v the starting indexes of each stretch.
This looks like it’s doing extra work because it’s computing the cumsum for the whole vector, but it’s actually faster than the other solutions so far, for both small and large numbers of groups.
Here’s how I simulated the data
On my machine the timings with
n <- 10are:changing to
n <- 1e5the timings are:I suspect this is faster than doing matrix multiplication, even with a sparse matrix package, because one doesn’t have to form the matrix or do any multiplication. If more speed is needed, I suspect it could be sped up by writing it in C; not hard to do with the
inlineandrcpppackages, but I’ll leave that to you.