I am looking to remove multiple observations from one column within a dataframe based on their value without affecting the rest of the row.
df1=data.frame(c("male","female","male"),seq(1,30),seq(11,40))
names(df1) = c("col_a","col_b","col_c")
For example removing the values from column b that are below 5 or above 20 without affecting columns a or c. I am then looking to use this data for descriptive analysis and summaries.
Currently I am using this code to do the job:
df1$col_b[df1$col_b<5|df1$col_b>20] <- ""
df1$col_b<-as.numeric(df1$col_b)
However this creates NA values which get in the way of the analysis. Is there a way of doing this that does not create NA values or a quick way of removing them without affecting the row?
A numeric column can have normal values,
NA,Inf,-InfandNaN. But “empty” is not a possible value.The reason for having
NAis to mark that the value isn’t available – seems exactly what you want! Using a negative number is just a more awkward way of doing the same thing – you’d have to remove all negative numbers before calculatingmean,sumetc… You can do the same thing withNA– and that functionality is typically built into the functions: by specifyingna.rm=TRUE.