I’m so new to R that I’m having trouble finding what I need in other peoples’ questions. I think my question is so easy that nobody else has bothered to ask it.
What would be the simplest code to create a new data frame which excludes data which are univariate outliers(which I’m defining as points which are 3 SDs from their condition’s mean), within their condition, on a certain variable?
I’m embarrassed to show what I’ve tried but here it is
greaterthan <- mean(dat$var2[dat$condition=="one"]) +
2.5*(sd(dat$var2[dat$condition=="one"]))
lessthan <- mean(dat$var2[dat$condition=="one"]) -
2.5*(sd(dat$var2[dat$condition=="one"]))
withoutliersremovedone1 <-dat$var2[dat$condition=="one"] < greaterthan
and I’m pretty much already stuck there.
Thanks
Now only return those rows which are not (
!) greater than 2absolutesd‘s from themeanof the variable in question. Obviously change 2 to however manysd‘s you want to be the cutoff.Or more short-hand using the
scalefunction:edit
This can be extended to looking within groups using
byThis assumes
dat$var1is your variable defining the group each row belongs to.