As in title, I’m trying to locate Stata equivalent to R’s complete.cases function.
The closest i got so far was to use
generate sample = e(sample)
after running regression and either deleting cases or using if clause on this newl generated variable (solution stolen from here).
Is there any better solution?
I’m not sure how exactly you’re accustomed to using
complete.casesin R, but here is an example with application both in R and a Stata equivalent (rmiss2):First, let’s make up some data in R for demonstration. We’ll save it as a
dtafile that we can use in Stata later on.The data look like this:
Running
complete.caseson the data simply gives us a vector ofTRUEs andFALSEs telling us if each row represents a complete case.More often,
complete.casesis useful for subsetting our data, as in the following:Or, here, subsetting just based on whether the first three columns are complete.
Now, let’s switch to Stata.
First, install
rmiss2if you don’t already have it.Second, load the dta file that we created in R.
Third, we’ll use
rmiss2to generate a new column named “nmis” that tells us how many variables are missing for each case.Finally, we can use
keep if...to drop cases with missing data.As with
complete.cases, you can also specify which columns to check for completeness.Update
It should be noted that
keep if...is “destructive”–you can’t get back to your original dataset without reloading your dat file. As such it is safer to useifas follows: