As in title, I’m trying to locate Stata equivalent to R’s complete.cases function. The

Question

0

Editorial Team

Asked: June 13, 20262026-06-13T18:48:36+00:00 2026-06-13T18:48:36+00:00

As in title, I’m trying to locate Stata equivalent to R’s complete.cases function. The

0

As in title, I’m trying to locate Stata equivalent to R’s complete.cases function.

The closest i got so far was to use

generate sample = e(sample)

after running regression and either deleting cases or using if clause on this newl generated variable (solution stolen from here).

Is there any better solution?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T18:48:37+00:00

I’m not sure how exactly you’re accustomed to using complete.cases in R, but here is an example with application both in R and a Stata equivalent (rmiss2):

First, let’s make up some data in R for demonstration. We’ll save it as a dta file that we can use in Stata later on.

library(foreign)
set.seed(1)
dat <- data.frame(one = rnorm(15),
                  two = sample(LETTERS, 15),
                  three = rnorm(15),
                  four = runif(15))
dat <- data.frame(lapply(dat, function(x) { x[sample(15, 5)] <- NA; x }))
write.dta(dat, file="completeCases.dta")

The data look like this:

dat
#           one  two       three      four
# 1          NA    M  0.80418951 0.8921983
# 2   0.1836433    O -0.05710677        NA
# 3  -0.8356286    L  0.50360797 0.3899895
# 4          NA    E          NA        NA
# 5   0.3295078    S          NA 0.9606180
# 6  -0.8204684 <NA> -1.28459935 0.4346595
# 7   0.4874291 <NA>          NA        NA
# 8   0.7383247    C -0.23570656 0.3999944
# 9          NA    N -0.54288826 0.3253522
# 10 -0.3053884 <NA>          NA 0.7570871
# 11         NA    R -0.64947165 0.2026923
# 12  0.3898432 <NA>          NA        NA
# 13         NA    K  1.15191175        NA
# 14 -2.2146999 <NA>  0.99216037 0.2454885
# 15  1.1249309    Q -0.42951311 0.1433044

Running complete.cases on the data simply gives us a vector of TRUEs and FALSEs telling us if each row represents a complete case.

complete.cases(dat)
#  [1] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE 
#  [9] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

More often, complete.cases is useful for subsetting our data, as in the following:

dat[complete.cases(dat), ]
#           one two      three      four
# 3  -0.8356286   L  0.5036080 0.3899895
# 8   0.7383247   C -0.2357066 0.3999944
# 15  1.1249309   Q -0.4295131 0.1433044

Or, here, subsetting just based on whether the first three columns are complete.

dat[complete.cases(dat[, 1:3]), ]
#           one two       three      four
# 2   0.1836433   O -0.05710677        NA
# 3  -0.8356286   L  0.50360797 0.3899895
# 8   0.7383247   C -0.23570656 0.3999944
# 15  1.1249309   Q -0.42951311 0.1433044

Now, let’s switch to Stata.

First, install rmiss2 if you don’t already have it.

. findit rmiss2

Second, load the dta file that we created in R.

. use "path\to\completeCases.dta", clear

Third, we’ll use rmiss2 to generate a new column named “nmis” that tells us how many variables are missing for each case.

. egen nmis = rmiss2(one two three four)
. list

     +-----------------------------------------------+
     |       one   two       three       four   nmis |
     |-----------------------------------------------|
  1. |         .     M    .8041895   .8921983      1 |
  2. |  .1836433     O   -.0571068          .      1 |
  3. | -.8356286     L     .503608   .3899895      0 |
  4. |         .     E           .          .      3 |
  5. |  .3295078     S           .    .960618      1 |
     |-----------------------------------------------|
  6. | -.8204684     .   -1.284599   .4346595      1 |
  7. |  .4874291     .           .          .      3 |
  8. |  .7383247     C   -.2357066   .3999944      0 |
  9. |         .     N   -.5428883   .3253522      1 |
 10. | -.3053884     .           .   .7570871      2 |
     |-----------------------------------------------|
 11. |         .     R   -.6494716   .2026923      1 |
 12. |  .3898432     .           .          .      3 |
 13. |         .     K    1.151912          .      2 |
 14. |   -2.2147     .    .9921604   .2454885      1 |
 15. |  1.124931     Q   -.4295131   .1433044      0 |
     +-----------------------------------------------+

Finally, we can use keep if... to drop cases with missing data.

. keep if (nmis == 0)
(12 observations deleted)

. list

     +-----------------------------------------------+
     |       one   two       three       four   nmis |
     |-----------------------------------------------|
  1. | -.8356286     L     .503608   .3899895      0 |
  2. |  .7383247     C   -.2357066   .3999944      0 |
  3. |  1.124931     Q   -.4295131   .1433044      0 |
     +-----------------------------------------------+

As with complete.cases, you can also specify which columns to check for completeness.

. use "path\to\completeCases.dta", clear
(Written by R.              )

. egen nmis = rmiss2(one two three)

. keep if (nmis == 0)
(11 observations deleted)

. list

     +-----------------------------------------------+
     |       one   two       three       four   nmis |
     |-----------------------------------------------|
  1. |  .1836433     O   -.0571068          .      0 |
  2. | -.8356286     L     .503608   .3899895      0 |
  3. |  .7383247     C   -.2357066   .3999944      0 |
  4. |  1.124931     Q   -.4295131   .1433044      0 |
     +-----------------------------------------------+

Update

It should be noted that keep if... is “destructive”–you can’t get back to your original dataset without reloading your dat file. As such it is safer to use if as follows:

. summarize one two three four if  nmis == 0

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         one |         3    .3425423    1.038475  -.8356286   1.124931
         two |         3    6.666667    5.507571          1         12
       three |         3   -.0538706    .4924195  -.4295131    .503608
        four |         3    .3110961     .145398   .1433044   .3999944

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

As in title, I’m trying to locate Stata equivalent to R’s complete.cases function. The

Leave an answerCancel reply

1 Answer

Update

Leave an answer
Cancel reply