I have a following stata code which I am trying to convert to R:
dataframe
y1 y2 y3 y4 y5 y6 y11 y12 y13 y14 y15 y16
5 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 2 1 2 0 0
0 0 0 0 0 0 1 1 1 2 0 0
0 0 0 0 0 0 1 8 1 2 0 0
0 0 0 0 0 0 1 1 1 2 0 0
0 0 0 0 0 0 1 1 1 2 0 0
1 1 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0 0
2 2 5 1 1 2 2 2 1 1 2 1
local z1 "y1 y12 y3 y4 y5 y6"
local z2 "y11 y12 y13 y14 y15 y16"
local i = 1
local n : word count `z1'
gen k=.
while `i'<=`n' {
local z1`i' : word `i' of `z1'
local z2`i' : word `i' of `z2'
replace k=max(0,`z1`i'')*(`z2`i''==2|`z2`i''==1)
local i=`i'+1
}
Expected output:
k
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
I used the following equivalent R code:
dataframe$z1<- "y1 y12 y3 y4 y5 y6"
dataframe$z2<- "y11 y12 y13 y14 y15 y16"
i<- 1
n<-sapply(gregexpr("\\W+", z1), length) + 1
dataframe$k<-NA
for (j in i:n){
.... #I wanted to refer to each word of z1
...#e.g.,dataframe$z1[i]<-which gives word i of z1
.. #I wanted to refer to each word of z2
... #e.g.,dataframe$z1[i]<-whicg gives word i of z2
dataframe$k<-with(dataframe, pmax(0,z1[j])*ifelse(z2[j] %in% c(1,2),1,0))
}
The dotted lines indicate that I was not able to find the equivalent code in R. I would appreciate if you could help me in this regard.
# Updated Stata codes and data (number of observation is reduced to 10)
y1 y2 y3 y4 y5 y6 y11 y12 y13 y14 y15 y16
5 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
y111 y112 y113 y114 y115 y116 y1111 y1112 y1113 y1114 y1115 y1116
1 0 0 0 0 0 81000 0 0 0 0 0
1 0 0 0 0 0 86000 0 0 0 0 0
1 0 0 0 0 0 96000 0 0 0 0 0
1 0 0 0 0 0 84000 0 0 0 0 0
1 0 0 0 0 0 76000 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
local z1 "y1 y2 y3 y4 y5 y6"
local z2 "y11 y12 y13 y14 y15 y16"
local z3 "y111 y112 y113 y114 y115 y116"
local z4 "y1111 y1112 y1113 y1114 y1115 y1116"
local i = 1
local n : word count `z1'
gen k=.
gen r=0
gen s=0
gen t=0
while `i'<=`n' {
local z1`i' : word `i' of `z1'
local z2`i' : word `i' of `z2'
local z3`i' : word `i' of `z3'
local z4`i' : word `i' of `z4'
replace k=max(0,`z4`i'')*(`z1`i''==5|`z1`i''==10|`z2`i''==2|`z2`i''==1|`z3`i''==1)
replace r=r+k if `i'<=3
replace s=s+k if `i'>3
replace t=t+k
local i=`i'+1
}
#Expected output
t r s k
81000 81000 0 0
86000 86000 0 0
96000 96000 0 0
84000 84000 0 0
76000 76000 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
Nick makes a good point that your
maxcall doesn’t reference the previousk, so it collapses to a check of the sixth column. Here’s the R-equivalent, assuming you really wanted the row maximum. I stored your data in a txt file first.This yields
Update — here’s the solution to your updated, full problem. It uses, more or less, the same building blocks. Once you’re familiar with the basics of R, I think you will get the most mileage out of
apply,lapply, andmapply.This yields: