I’m kinda new to R and just started using it to plot some graphs.
I have this code:
times=integer(nrow(df));
for(i in 1:nrow(df)) {
time=df[i+1,4]-df[i,4];
times[i]<-time
}
There must be a more clever way to do this, without first initializing times, isn’t it?
I’m not sure, but what I’m searching for is something like:
times <- for(i in 1:nrow(df)) yield df[i+1,4]-df[i,4]
(I know this is not valid code :))
I hope this question isn’t asked already. I searched and didn’t find anything concrete on “yield” and initializing of arrays.
As requested….
Sample data in df:
7926 08:00:27:ed:f3:e5 MESSAGEHANDLER START 1.319242e+12
7927 08:00:27:ed:f3:e5 MESSAGEHANDLER END 1.319242e+12
7928 08:00:27:ed:f3:e5 MESSAGEHANDLER START 1.319242e+12
7929 08:00:27:ed:f3:e5 MESSAGEHANDLER END 1.319242e+12
7930 08:00:27:ed:f3:e5 MESSAGEHANDLER START 1.319242e+12
7931 08:00:27:ed:f3:e5 MESSAGEHANDLER END 1.319242e+12
7932 08:00:27:ed:f3:e5 MESSAGEHANDLER START 1.319242e+12
7933 08:00:27:ed:f3:e5 MESSAGEHANDLER END 1.319242e+12
7934 08:00:27:ed:f3:e5 MESSAGEHANDLER START 1.319242e+12
7935 08:00:27:ed:f3:e5 MESSAGEHANDLER END 1.319242e+12
7936 08:00:27:ed:f3:e5 MESSAGEHANDLER START 1.319242e+12
7937 08:00:27:ed:f3:e5 MESSAGEHANDLER END 1.319242e+12
7938 08:00:27:ed:f3:e5 MESSAGEHANDLER START 1.319242e+12
7939 08:00:27:ed:f3:e5 MESSAGEHANDLER END 1.319242e+12
After my loop is times is:
[7921] 508 500 497 501 466 502 505 500 488 501 500 501 490 501 478 501 501 501
[7939] NA
Ok, to get more concrete, what I really want to do is this:
times1=integer(nrow(df));for(i in 1:nrow(df)) { if (df[i,3] == "START") times1[i]<-df[i+1,4]-df[i,4]}
times2=integer(nrow(df));for(i in 1:nrow(df)) { if (df[i,3] == "END") times2[i]<-df[i+1,4]-df[i,4]}
Then the output is something like for times1:
[7921] 0 500 0 501 0 502 0 500 0 501 0 501 0 501 0 501 0 501
[7939] 0
But I need:
[3960] 500 501 502 500 501 501 501 501 501
In words:
I’m parsing measured data from a csv file, which lands in df as seeing above.
This is for “START” followed by “END”
The data in df describes that a packet was received when there is a “START” in df[,3] at a specific unixtime in miliseconds in df[,4].
Now I need to calculate the time that passed from receiving to sending (this is the time, my machine needs to analyze the RECEIVED PACKET and calculate a result to SEND it.)
So END in df[,3] means packet was sent successfully at unixtime df[,4].
The other case is “END” followed by “START”
This is the time that passed in between “my packet was sent” and a new one “was received”.
I add now a sample of a csv and my full code for reproduction:
#load csv in df!
df = read.csv("/tmp/measure.csv",FALSE)
absolute=integer(nrow(df));for(i in 1:nrow(df)) {time=df[i,4]-df[1,4];absolute[i]<-(time/1000)}
times=integer(nrow(df));for(i in 1:nrow(df)) {time=df[i+1,4]-df[i,4];times[i]<-time}
#plot(absolute,times)
plot(absolute,times,lty=1,pch=1,col="#11223399",type="l")
lines(absolute,array(mean(times,na.rm=1),nrow(df)),col="red")
Here my measure.csv:
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238175202
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238175690
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238176195
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238176665
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238177167
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238177669
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238178172
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238178639
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238179139
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238179658
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238180161
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238180654
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238181154
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238181669
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238182170
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238182629
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238183130
I hope this makes it more clear.
Hope I’m not too far off- why not avoid the loop altogether?:
I’m assuming the the START and END times in your dataset are in order..
not sure about what kind of checks you need to do, since they weren’t in the for loop you posted in the question.
—————–EDIT—————————
to include from the comment below that appears to have gotten it right,
this really was a question about indexing:
where:
gives you all the differences, you just wanted to split this into two objects, one for even indices, another for odd indices:
and unrelated, but also useful: you used ‘absolute’ and ‘df’ for the names of objects in your code, but these are also functions in R, so although it works, it’s better form to give them names that aren’t already taken. Glad you got what you were after!