I have a dataframe like variable x.
x<-"start.x stop.x strand.x start.y stop.y strand.y
1 16954189 16963562 - 16954189 16963562 -
2 16954189 16963562 - 150045170 150065177 -
3 150045170 150065177 - 16954189 16963562 -
4 150045170 150065177 - 150045170 150065177 -
5 97061519 97190927 - 97061519 97190927 -
6 97061519 97190927 - 135190856 135202610 +
7 135190856 135202610 + 97061519 97190927 -
8 135190856 135202610 + 135190856 135202610 +"
dat <- read.table(textConnection(x), header=TRUE)
Normally I calculate for each row the relative distance between start.x and start.y with the following code:
zz <- transform(x,
distance_startsite = abs(as.numeric(start.x) - as.numeric(start.y)))
But before calculating this time, we first need to look to the strand.x and strand.y.
- If the strand.x is “-” the official start site is stop.x
- If the strand.x is “+” the official start site is start.x
- If the strand.y is “-” the official start site is stop.y
- If the strand.y is “+” the official start site is start.y
Row 1 in table dat must calucate this: abs(as.numeric(stop.x) – as.numeric(stop.y) instead of abs(as.numeric(start.x) – as.numeric(start.y).
My question is, is there a way to calculate this for each row like zz?
Thanks
EDIT: my first thought was something like this:
for (i in 1:nrow(dd)){
if (dat$strand.x[i,] == "-" & dat$stand.y[i,] == "-") {
result[i]<-transform(dat,distance_startsite[i] = abs(as.numeric(stop.x[i,]) - as.numeric(stop.y[i,]))} else
if (dat$strand.x[i,] == "+" & dat$stand.y[i,] == "-") {
result[i]<-transform(dat,distance_startsite[i] = abs(as.numeric(start.x[i,]) - as.numeric(stop.y[i,]))} else
if (dat$strand.x[i,] == "-" & dat$stand.y[i,] == "+") {
result[i]<-transform(dat,distance_startsite[i] = abs(as.numeric(stop.x[i,]) - as.numeric(start.y[i,]))} else
if (dat$strand.x[i,] == "+" & dat$stand.y[i,] == "+") {
result[i]<-transform(dat,distance_startsite[i] = abs(as.numeric(start.x[i,]) - as.numeric(start.y[i,]))}
}
But that doesn’t work yet.
If you do this step by step and use some interim variables, you will save yourself a lot of trouble and your code will become much clearer.
Here is what I suggest:
Two further observations:
as.numericall the timeThe code:
The results: