I would like to count the rows of a data frame, according to the number of variables that are missing. So for example in the data frame below I would like the code to return the list:
3, 5, 1, 1, 0
because the are 3 rows with no missing variables, 5 rows with 1 missing variable, 1 row with 2 missing variables, 1 row with 3 missing variables and 0 rows with 4 missing variables:
v1 v2 v3 v4
1 1 1 1 1
2 NA NA 1 1
3 1 1 NA 1
4 1 1 1 1
5 NA 1 1 1
6 NA 1 1 1
7 1 1 1 NA
8 NA 1 1 1
9 1 1 1 1
10 1 NA NA NA
Here is the example data that can be loaded in R:
dt <- structure(list(v1 = c(1, NA, 1, 1, NA, NA, 1, NA, 1, 1), v2 = c(1, NA, 1, 1, 1, 1, 1, 1, 1, NA), v3 = c(1, 1, NA, 1, 1, 1, 1, 1, 1, NA), v4 = c(1, 1, 1, 1, 1, 1, NA, 1, 1, NA)), .Names = c("v1", "v2", "v3", "v4"), row.names = c(NA, -10L), class = "data.frame")
I can do this already by looping through the data frame row by row and incrementing counters for the number of missing variables, but it is extremely slow on large data frames so I was hoping there is a slick way to do it ?
If you really need the last 0 (four
NAs):