I am trying to step through a vector to find the outliers using IQR to calculate a range. When I run this script looking for values to the right of the IQR I get results and when I run to the left I get the error: missing value where TRUE/FALSE needed. How can I scrub out the true and false in my dataset?
here is my script:
data = c(100, 120, 121, 123, 125, 124, 123, 123, 123, 124, 125, 167, 180, 123, 156)
Q3 <- quantile(data, 0.75) ##gets the third quantile from the list of vectors
Q1 <- quantile(data, 0.25) ## gets the first quantile from the list of vectors
outliers_left <-(Q1-1.5*IQR(data))
outliers_right <-(Q3+1.5*IQR(data))
IQR <- IQR(data)
paste("the innner quantile range is", IQR)
Q1 # quantil at 0.25
Q3 # quantile at 0.75
# show the range of numbers we have
paste("your range is", outliers_left, "through", outliers_right, "to determine outliers")
# count ho many vectors there are and then we will pass this value into a loop to look for
# anything above and below the Q1-Q3 values
vectorCount <- sum(!is.na(data))
i <- 1
while( i < vectorCount ){
i <- i + 1
x <- data[i]
# if(x < outliers_left) {print(x)} # uncomment this to run and test for the left
if(x > outliers_right) {print(x)}
}
and the error I get is
[1] 167
[1] 180
[1] 156
Error in if (x > outliers_right) { :
missing value where TRUE/FALSE needed
as you can see if you run this script, it is finding my 3 outliers on the right and also throws the error, but when I run this again on the left of my IQR, and I do have an outlier of 100 in the vector, I just get the error without other results being displayed.
How can I fix this script? any help greatly appreciated. I’ve been scouring the web and my books for days on how to fix this.
As noted in the comments, the error is due to the way you’ve constructed your
whileloop. At the last iteration,i == 16though there are only 15 elements to process. Changing fromi <= vectorCounttoi < vectorCountfixes the problem:However, this is really not how R works and you’ll soon be frustrated at how long that code will take to run for any appreciable sized data. R is “vectorized” meaning that you can operate on all 15 elements of
dataat once. To print your outliers, I’d do this:Or to get all of them at once using the OR operator:
For a little context, The above logical comparisons create a boolean value for each element of
dataand R only returns those that are TRUE. You can check this for yourself by typing:The
[bit is actually an extraction operator, used to retrieve a subset of a data object. See the help page for some good background?"[".