I’m dealing with a data set that has some obvious errors in the data

Question

0

Asked: May 27, 20262026-05-27T10:07:13+00:00 2026-05-27T10:07:13+00:00

I’m dealing with a data set that has some obvious errors in the data

0

I’m dealing with a data set that has some obvious errors in the data (ie kid that’s < 1yr old with a $50,000 credit card balance). I can’t go thru line by line as set is >100k lines. Is there any formal work done on how to search for these types of obvious problems in data sets or even better any packages in R? Or should I just start doing histograms?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T10:07:14+00:00

As far as I know there is no such package. It seems like what you’re asking for is very specialized. I think you’re really looking for anomalies or outliers. Though it would be cool to have some thing that regressed all variables on the others and searched for potential extreme outliers (probably not that hard to make)

2 thoughts:

1) a scatterplot of variable’s you’ll conect such as age and income. Even with 100k lines that one (1 yr old making 50K) would pop up way away from all the others.

2) Running regression and looking at the plot of the model. There’s some pretty good outlier detection there.

3) Search through the standardized residuals and look for values above 2 or most likely 3 sd’s with which statement that indexes the observation numbers of the data.

Something like: dataframe[which(rstandard(model)>3), ]

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m dealing with a data set that has some obvious errors in the data

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply