I’ve got a data frame that contains several interleaved values that occurred in a timeline. I’d like to create a new data frame that contains line numbers (row IDs, basically), a file descriptor, operation and a “size” value.
Example:
line fd syscall size
1 1 1 lseek 1289020416
2 2 1 lseek 1289021440
3 3 2 lseek 1289024512
4 4 1 lseek 1289025536
5 5 2 lseek 1289026560
6 6 1 lseek 1289027584
I’d like to compute a diff of the size values per fd and show the starting point of the diff. The diff function itself throws away a lot of data. Is there something similar that will help me have context (e.g. where the beginning of each line was)?
I’d like results that look like the following where I know how far each fd has moved since the previous line, and what the previous line was.
line fd diff
1 1 1 1024
2 2 1 4096
3 3 2 2048
4 4 1 2048
Is there something I can do that’s easier than tearing it all apart and looping? I have to believe someone has a slightly better diff out there.
Example input:
structure(list(line = 1:6, fd = c(1, 1, 2, 1, 2, 1), syscall = structure(c(1L,
1L, 1L, 1L, 1L, 1L), class = "factor", .Label = "lseek"), size = c(1289020416,
1289021440, 1289024512, 1289025536, 1289026560, 1289027584)), .Names = c("line",
"fd", "syscall", "size"), row.names = c(NA, -6L), class = "data.frame")
Use plyr to cut the data.frame in pieces and transform to attach the new vector.