I have a large R data.table with a multi column key, where some value

Question

0

Asked: June 17, 20262026-06-17T07:49:10+00:00 2026-06-17T07:49:10+00:00

I have a large R data.table with a multi column key, where some value

0

I have a large R data.table with a multi column key, where some value columns contain some NA. I’d like to remove groups that are entirely NA in one or more value columns, but keep the whole group otherwise. Repeating this for each column of the key.

To give a simplified example:

library(data.table)
DT = data.table(
    Series = rep(letters[1:12], each = 3), 
    Id = 1:12,
    Value1 = c(1:3, NA, 5:9, rep(NA,3), 1:3, NA, 5:9, rep(NA,3), 1:3, NA, 5:9, rep(NA,3)), 
    Value2 = c(rep(NA,3), 1:4, NA, 6:9, rep(NA,3), 1:9, 1:9, rep(NA,3)))
DT
    Series Id Value1 Value2
 1:      a  1      1     NA
 2:      a  2      2     NA
 3:      a  3      3     NA
 4:      b  4     NA      1
 5:      b  5      5      2
 6:      b  6      6      3
 7:      c  7      7      4
 8:      c  8      8     NA
 9:      c  9      9      6
10:      d 10     NA      7
11:      d 11     NA      8
12:      d 12     NA      9
13:      e  1      1     NA
14:      e  2      2     NA
15:      e  3      3     NA
16:      f  4     NA      1
17:      f  5      5      2
18:      f  6      6      3
19:      g  7      7      4
20:      g  8      8      5
21:      g  9      9      6
22:      h 10     NA      7
23:      h 11     NA      8
24:      h 12     NA      9
25:      i  1      1      1
26:      i  2      2      2
27:      i  3      3      3
28:      j  4     NA      4
29:      j  5      5      5
30:      j  6      6      6
31:      k  7      7      7
32:      k  8      8      8
33:      k  9      9      9
34:      l 10     NA     NA
35:      l 11     NA     NA
36:      l 12     NA     NA
    Series Id Value1 Value2

So I would like to drop:

Series: a,d,e,h and l
Ids: 4, 10,11 and 12

Correct result should look like:

    Series Id Value1 Value2
 1:      b  5      5      2
 2:      b  6      6      3
 3:      c  7      7      4
 4:      c  8      8     NA
 5:      c  9      9      6
 6:      f  5      5      2
 7:      f  6      6      3
 8:      g  7      7      4
 9:      g  8      8      5
10:      g  9      9      6
11:      i  1      1      1
12:      i  2      2      2
13:      i  3      3      3
14:      j  5      5      5
15:      j  6      6      6
16:      k  7      7      7
17:      k  8      8      8
18:      k  9      9      9
    Series Id Value1 Value2

What I managed so far:

I can find the Series that are NA for Value1 like this:

DT[, sum(1-is.na(Value1)) == 0, by = Series][V1 == TRUE]

And I could even do

setkey(DT, Series)
DT = DT[DT[, sum(1-is.na(Value)) == 0, by = Series][V1 != TRUE]]

But now I end up with V1 appearing in the final table.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T07:49:11+00:00

You can do this to get those entries where not ALL Value are NA:

setkey(DT, "Series")
DT[, .SD[(!all(is.na(Value)))], by=Series]

The parens around !all are needed to avoid not-join syntax which Matthew will look into (see comments). Same as this :

DT[, .SD[as.logical(!all(is.na(Value)))], by=Series]

Building on that to answer the new clarified question :

allNA = function(x) all(is.na(x))     # define helper function
for (i in c("Id","Series"))
    DT = DT[, if (!any(sapply(.SD,allNA))) .SD else NULL, by=i]
DT
    Series Id Value1 Value2
 1:      i  1      1      1
 2:      i  2      2      2
 3:      i  3      3      3
 4:      b  5      5      2
 5:      b  6      6      3
 6:      f  5      5      2
 7:      f  6      6      3
 8:      j  5      5      5
 9:      j  6      6      6
10:      c  7      7      4
11:      c  8      8     NA
12:      c  9      9      6
13:      g  7      7      4
14:      g  8      8      5
15:      g  9      9      6
16:      k  7      7      7
17:      k  8      8      8
18:      k  9      9      9

That changes the order, though. So isn’t precisely the result requested. The following keeps the order and should be faster too.

# starting fresh from original DT in question again
DT[,drop:=FALSE]
for (i in c("Series","Id"))
    DT[,drop:=drop|any(sapply(.SD,allNA)),by=i]
DT[(!drop)][,drop:=NULL][]
    Series Id Value1 Value2
 1:      b  5      5      2
 2:      b  6      6      3
 3:      c  7      7      4
 4:      c  8      8     NA
 5:      c  9      9      6
 6:      f  5      5      2
 7:      f  6      6      3
 8:      g  7      7      4
 9:      g  8      8      5
10:      g  9      9      6
11:      i  1      1      1
12:      i  2      2      2
13:      i  3      3      3
14:      j  5      5      5
15:      j  6      6      6
16:      k  7      7      7
17:      k  8      8      8
18:      k  9      9      9

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a large R data.table with a multi column key, where some value

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply