I was working previously with SAS and then decided to shift to R for

Question

0

Asked: June 13, 20262026-06-13T23:19:23+00:00 2026-06-13T23:19:23+00:00

I was working previously with SAS and then decided to shift to R for

0

I was working previously with SAS and then decided to shift to R for academic requirements reasons.
My data (healthdemo) are health data containing some health diagnostic codes (ICD-10), I want to separate these codes into different columns. This is part of str(healthdemo):

$ PATIENT_KEY     : int  7391510 7404298 7390196 7381208 7401691 7381223 7383005 10188634 7384574 7398317 ...
 $ ICDCODE         : Factor w/ 1125 levels "","H00","H00.0",..: 654 56 654 654 665 48 90 679 654 654 ...
 $ PATIENT_ID      : int  39387 50244 38388 27346 49922 27901 27867 61527 33186 45309 ...
 $ DATE_OF_BIRTH   : Factor w/ 14801 levels "","01/01/1000",..: 7506 10250 52 73 94 6130 85 2710 95 100 ...

the ICDCODE contains many diseases from H00 to J99; first, I separated the letters from numbers in the ICDCODE

healthdemo$icd_char = substr(healthdemo$ICDCODE,1,1)
healthdemo$icd_num = substr(healthdemo$ICDCODE,2,2)

then I created diseases columns by this function:

healthdemo$cvd = 0
healthdemo$ihd = 0
healthdemo$mi = 0
healthdemo$dys = 0
healthdemo$afib = 0
healthdemo$chf = 0

now I want to apply a function similar to this SAS function (that I used to use):

if icd_char = 'I' and 01 <= icd_num < 52 then cvd = 1;

if icd_char = 'I' and 20 <= icd_num <= 25 then ihd = 1;

if icd_char = 'I' and 21 <= icd_num <= 22 then mi = 1;

if icd_char = 'I' and 46 <= icd_num <= 49 then dys = 1;

if icd_char = 'I' and icd_num = 48 then afib = 1;

this function will assign each patient with the given ICD character and ICD-number into cvd=1 (e.g.) and so on.

I tried using these functions in R but they didnt work for me:

healthdemo$cvd[healthdemo$icd_char == 'I' & 01 <= healthdemo$icd_num 
      & healthdemo$icd_num < 52 ] <- 1

and this

if (healthdemo$icd_char == "I" &  01 < = healthdemo$icd_num < 52  )
   {healthdemo$cvd <- 1}

Would somebody help me please ?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T23:19:24+00:00

I had a similar struggle when I transitioned from SAS to R for health-related research. My solution was to, as much as possible, let go the “if…then” approach and take advantage of some of R’s unique native programming capabilities. Here are two approaches to your problem.

First, you can use indexing to find and replace elements. Here is some hospital discharge data of the kind you describe:

hosp<-read.csv(file="http://www.columbia.edu/~cjd11/charles_dimaggio/DIRE/resources/R/sparcsShort.csv",stringsAsFactors=F)
head(hosp)

Say I want to identify every birth-related diagnosis in Manhattan. I first create a logical vector that returns a series of TRUES and FALSES for my search criteria, then I index my data frame by that logical vector. In this case I am also restricting the columns or variables I want returned:

myObs<-hosp$county==59 & hosp$pdx=="V3000 " #note space
myVars<-c("age", "sex", "disp")
myFile<-hosp[myObs,myVars]
head(myFile)

The second, and perhaps more computationally elegant, approach is to use a function like “grep”. Say you’re interested in identifying all substance abuse diagnoses, e.g. alcohol abuse (291, 303, 305 and sub-codes), opioids, cannabis, amphetamines, hallucinogenics, and cocaine (304 and related sub-codes), or non-specific substance abuse-related diagnoses (292). In SAS you would write out a long if-then statement (or a more efficient array) of some kind:

#/*********************** SUBSTANCE ABUSE *****************/
#if pdx in /* use ICD9 codes to create diagnoses */ (’2910’,’2911’,’2912’,’2913’,’2914’,’2915’,
#   ’29181’,’29189’, ’2919’,’2920’,’29211’,’29212’,’2922’,’29281’,’29282’,’29283’, #........etc....,’30592’,’30593’)
#Then subst_ab=1; 
#Else subst_ab=0;

In R, you can instead write:

substance<-grep("^291[0-9,0-9]|^292[0-9,0-9]|^303[0-9,0-9]|^304[0-9,0-9]^305[0-9,0-9]", hosp$pdx)
hosp$pdx[substance]
hosp$subsAb<-"No"
hosp$subsAb[substance]<-"Yes"
hosp$subsAb[1:100]

table(hosp$subsAb)
plot(table(hosp$subsAb))

library(ggplot2)
qplot(subsAb, age,data=hosp, alpha = I(1/50))

Tomas Aragon has written a wonderful introduction to R for epidemiologists that goes into these approaches in detail. (http://www.medepi.net/docs/ph251d_fall2012_epir-chap01-04.pdf)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I was working previously with SAS and then decided to shift to R for

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply