#For say, I got a situation like this
user_id = c(1:5,1:5)
time = c(1:10)
visit_log = data.frame(user_id, time)
#And I've wrote a method to calculate interval
interval <- function(data) {
interval = c(Inf)
for (i in seq(1, length(data$time))) {
intv = data$time[i]-data$time[i-1]
interval = append(interval, intv)
}
data$interval = interval
return (data)
}
#But when I want to get intervals by user_id and bind them to the data.frame,
#I can't find a proper way
#Is there any method to get something like
new_data = merge(by(visit_log, INDICE=visit_log$user_id, FUN=interval))
#And the result should be
user_id time interval
1 1 1 Inf
2 2 2 Inf
3 3 3 Inf
4 4 4 Inf
5 5 5 Inf
6 1 6 5
7 2 7 5
8 3 8 5
9 4 9 5
10 5 10 5
#For say, I got a situation like this user_id = c(1:5,1:5) time = c(1:10)
Share
We can replace your loop with the
diff()function which computes the differences between adjacent indices in a vector, for example:To that we can prepend
Infto the differences viac(Inf, diff(x)).The next thing we need is to apply the above to each
user_idindividually. For that there are many options, but here I useaggregate(). Confusingly, this function returns a data frame with atimecomponent that is itself a matrix. We need to convert that matrix to a vector, relying upon the fact that in R, columns of matrices are filled first. Finally, we add andintervalcolumn to the input data as per your original version of the function.Here is a slightly expanded example, with 3 time points per user, to illustrate the above function: