I have dataframe that was created from the fusion of two dataframes. Both spanned

Question

0

Asked: May 27, 20262026-05-27T11:26:17+00:00 2026-05-27T11:26:17+00:00

I have dataframe that was created from the fusion of two dataframes. Both spanned

0

I have dataframe that was created from the fusion of two dataframes. Both spanned over the same time intervall but contained different information. When I put them together, the info overlapped since there is no holes in the time interval of one of the dataframe. Here is an example where the rows “sp=A and B” are part of a first df and the rows “sp=C” come from a second. The first dataframe is continuous but the second consists of sporadic events. The resulting dataframe looks like this:

start                  end                         sp
2010-06-01 17:00:00    2010-06-01 19:30:00         A
2010-06-01 19:30:01    2010-06-01 20:00:00         B
2010-06-01 19:45:00    2010-06-01 19:55:00         C
2010-06-01 20:00:01    2010-06-01 20:30:00         A
2010-06-01 20:05:00    2010-06-01 20:10:00         C
2010-06-01 20:12:00    2010-06-01 20:15:00         C
2010-06-01 20:30:01    2010-06-01 20:40:00         B
2010-06-01 20:35:00    2010-06-01 20:40:10         C
2010-06-01 20:40:01    2010-06-01 20:50:00         A

I would like to prioritize “C” so when it overlaps the time interval of another “sp”, the time interval of “A” or “B” is cut accordingly. As seen in the example, I sometimes have multiple events of “C” that overlap a single event of “A” or “B”. The result would be this:

start                  end                         sp
2010-06-01 17:00:00    2010-06-01 19:30:00         A
2010-06-01 19:30:01    2010-06-01 19:44:59         B
2010-06-01 19:45:00    2010-06-01 19:55:00         C
2010-06-01 19:55:01    2010-06-01 20:00:00         B
2010-06-01 20:00:01    2010-06-01 20:04:59         A
2010-06-01 20:05:00    2010-06-01 20:10:00         C
2010-06-01 20:10:01    2010-06-01 20:11:59         A
2010-06-01 20:12:00    2010-06-01 20:15:00         C
2010-06-01 20:15:01    2010-06-01 20:30:00         A
2010-06-01 20:30:01    2010-06-01 20:34:59         B
2010-06-01 20:35:00    2010-06-01 20:40:10         C
2010-06-01 20:40:11    2010-06-01 20:50:00         A

My date/time columns are in POSIXct. Don’t hesitate to ask if something is unclear.

Thanks in advance

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T11:26:18+00:00

Here’s a nice way to do this with the plyr package and a recursive function:

library(plyr)

splitTimes <- function(arow, df) {
  overlap_all    = arow$start > df[, 'start'] & arow$end < df[, 'end']
  overlap_middle = arow$start < df[, 'start'] & arow$end > df[, 'end']
  overlap_end    = arow$start < df[, 'start'] & arow$end > df[, 'start'] & arow$end < df[, 'end']
  overlap_start  = arow$start > df[, 'start'] & arow$end > df[, 'end'] & arow$start < df[, 'end']

  if(any(overlap_all)) {
    data.frame()
  } else if(any(overlap_middle)) {
    outrows = rbind(data.frame(start=arow$start, end=df[overlap_middle, 'start'][1]-1, sp=arow$sp),
                    data.frame(start=df[overlap_middle, 'end'][1]+1, end=arow$end, sp=arow$sp))
    ddply(outrows, 'start', 'splitTimes', df)
  } else if(any(overlap_end)) {
    data.frame(start=arow$start, end=df[overlap_end, 'start']-1, sp=arow$sp)
  } else if(any(overlap_start)) {
    data.frame(start=df[overlap_start, 'end']+1, end=arow$end, sp=arow$sp)
  } else {
    arow
  }
}

Then you can do:

> dfall = read.table('data.txt', header=T, colClasses=c('POSIXct', 'POSIXct', 'factor'))

> dfAB = subset(dfall, sp %in% c('A', 'B'))
> dfC  = subset(dfall, sp == 'C')

> arrange(rbind(ddply(dfAB, 'start', 'splitTimes', dfC), dfC), start)
                 start                 end sp
1  2010-06-01 17:00:00 2010-06-01 19:30:00  A
2  2010-06-01 19:30:01 2010-06-01 19:44:59  B
3  2010-06-01 19:45:00 2010-06-01 19:55:00  C
4  2010-06-01 19:55:01 2010-06-01 20:00:00  B
5  2010-06-01 20:00:01 2010-06-01 20:04:59  A
6  2010-06-01 20:05:00 2010-06-01 20:10:00  C
7  2010-06-01 20:10:01 2010-06-01 20:11:59  A
8  2010-06-01 20:12:00 2010-06-01 20:15:00  C
9  2010-06-01 20:15:01 2010-06-01 20:30:00  A
10 2010-06-01 20:30:01 2010-06-01 20:34:59  B
11 2010-06-01 20:35:00 2010-06-01 20:40:10  C
12 2010-06-01 20:40:11 2010-06-01 20:50:00  A

which gives you exactly what you want.

There might be some small bugs in the other cases, since your example data set doesn’t cover them all, but this is the general idea at least. Hope it helps. Good luck!

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have dataframe that was created from the fusion of two dataframes. Both spanned

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply