I have a big time series full in one dataframe and a list of

Question

0

Asked: June 10, 20262026-06-10T11:02:13+00:00 2026-06-10T11:02:13+00:00

I have a big time series full in one dataframe and a list of

0

I have a big time series full in one dataframe and a list of timestamps in a different dataframe test. I need to subset full with data points surrounding the timestamps in test. My first instinct (as an R noob) was to write the below, which was wrong

subs <- subset(full,(full$dt>test$dt-i) & (full$dt<test$dt+i))

Looking at the result I realized that R is looping through both the vectors simultaneously giving the wrong result. My option is to write a loop like the below:

subs<-data.frame()
for (j in test$dt) 
  subs <- rbind(subs,subset(full,full$dt>(j-i) & full$dt<(j+i)))

I feel that there might be a better way to do loops and this article implores us to avoid R loops as much as possible. The other reason is I might be hitting up against performance issues as this would be at the heart of an optimization algorithm. Any suggestions from gurus would be greatly appreciated.

EDIT:

Here is some reproducible code that shows the wrong approach as well as the approach that works but could be better.

#create a times series
full <- data.frame(seq(1:200),rnorm(200,0,1))
colnames(full)<-c("dt","val")

#my smaller array of points of interest
test <- data.frame(seq(5,200,by=23))
colnames(test)<-c("dt")

# my range around the points of interset
i<-3 

#the wrong approach
subs <- subset(full,(full$dt>test$dt-i) & (full$dt<test$dt+i))

#this works, but not sure this is the best way to go about it
subs<-data.frame()
for (j in test$dt) 
  subs <- rbind(subs,subset(full,full$dt>(j-i) & full$dt<(j+i)))

EDIT:
I updated the values to better reflect my usecase, and I see @mrdwab ‘s solution pulling ahead unexpectedly and by a wide margin.

I am using benchmark code from @mrdwab and the initialization is as follows:

set.seed(1)

full <- data.frame(
  dt  = 1:15000000,
  val = floor(rnorm(15000000,0,1))
)


test <- data.frame(dt = floor(runif(24,1,15000000)))

i <- 500

The benchmarks are:

       test replications elapsed relative
2    mrdwab            2    1.31  1.00000
3 spacedman            2   69.06 52.71756
1    andrie            2   93.68 71.51145
4  original            2  114.24 87.20611

Totally unexpected. Mind = blown. Can someone please shed some light in this dark corner and enlighten as to what is happening.

Important: As @mrdwab notes below, his solution works only if the vectors are integers. If not, @spacedman has the right solution

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T11:02:14+00:00

I don’t know if it’s any more efficient, but I would think you could also do something like this to get what you want:

subs <- apply(test, 1, function(x) c((x-2):(x+2)))
full[which(full$dt %in% subs), ]

I had to adjust your “3” to “2” since x would be included both ways.

Benchmarking (just for fun)

@Spacedman leads the way!

First, the required data and functions.

## Data
set.seed(1)

full <- data.frame(
  dt  = 1:200,
  val = rnorm(200,0,1)
)

test <- data.frame(dt = seq(5,200,by=23))

i <- 3 

## Spacedman's functions
cf = function(l,u){force(l);force(u);function(x){x>l & x<u}}
OR = function(f1,f2){force(f1);force(f2);function(x){f1(x)|f2(x)}}
funs = mapply(cf,test$dt-i,test$dt+i)
anyF = Reduce(OR,funs)

Second, the benchmarking.

## Benchmarking
require(rbenchmark)
benchmark(andrie = do.call(rbind, 
                           lapply(test$dt, 
                                  function(j) full[full$dt > (j-i) & 
                                    full$dt < (j+i), ])),
          mrdwab = {subs <- apply(test, 1, 
                                  function(x) c((x-(i-1)):(x+(i-1))))
                    full[which(full$dt %in% subs), ]},
          spacedman = full[anyF(full$dt),],
          original = {subs <- data.frame()
                      for (j in test$dt) 
                        subs <- rbind(subs, 
                                      subset(full, full$dt > (j-i) & 
                                        full$dt < (j+i)))},
          columns = c("test", "replications", "elapsed", "relative"),
          order = "relative")
#        test replications elapsed  relative
# 3 spacedman          100   0.064  1.000000
# 2    mrdwab          100   0.105  1.640625
# 1    andrie          100   0.520  8.125000
# 4  original          100   1.080 16.875000

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a big time series full in one dataframe and a list of

Leave an answerCancel reply

1 Answer

Benchmarking (just for fun)

Leave an answer
Cancel reply