I have the following code using data.frames, and I’m wondering how to write this

Question

0

Asked: June 13, 20262026-06-13T11:17:40+00:00 2026-06-13T11:17:40+00:00

I have the following code using data.frames, and I’m wondering how to write this

0

I have the following code using data.frames, and I’m wondering how to write this using data.tables, using the most efficient, most vectorized code?

data.frame code:

set.seed(1)
to <- cbind(data.frame(time=seq(1:5),bananas=sample(100,5),apples=sample(100,5)),setNames(data.frame(matrix(sample(100,90,replace=T),nrow=5)),paste0(1:18)))
from <- cbind(data.frame(time=seq(1:5),blah=sample(100,5),foo=sample(100,5)),setNames(data.frame(matrix(sample(100,90,replace=T),nrow=5)),paste0(1:18)))
from
to

rownames(to) <- to$time
to[as.character(from$time),paste0(1:18)] <- from[,paste0(1:18)]
to

Running this:

>     set.seed(1)
>     to <- cbind(data.frame(time=seq(1:5),bananas=sample(100,5),apples=sample(100,5)),setNames(data.frame(matrix(sample(100,90,replace=T),nrow=5)),paste0(1:18)))
>     from <- cbind(data.frame(time=seq(1:5),blah=sample(100,5),foo=sample(100,5)),setNames(data.frame(matrix(sample(100,90,replace=T),nrow=5)),paste0(1:18)))
>     from
  time blah foo  1  2   3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
1    1   66  22 98  2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2    2   35  13 74 72  50 52  8 57 61 18 56 53 90  7 85 65 20 76 39 12
3    3   27  47 36 11  49 21  4 53 24 75 33  8 45 34 86 75 89 73 11 85
4    4   97  90 44 45  18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5    5   61  58 15 65  76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
>     to
  time bananas apples  1   2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
1    1      27     90 21  50 94 39 49 67 83 79 48 10 92 26 34 90 44 21 24 80
2    2      37     94 18  72 22  2 60 80 65  3 87 32 30 48 84 87 72 72  6 46
3    3      57     65 69 100 66 39 50 11 79 48 44 52 46 77 35 39 40 13 65 42
4    4      89     62 39  39 13 87 19 73 56 74 25 67 34  9 34 78 33 25 88 82
5    5      20      6 77  78 27 35 83 42 53 70  8 41 66 88 48 97 76 15 78 61
> 
>     rownames(to) <- to$time
>     to[as.character(from$time),paste0(1:18)] <- from[,paste0(1:18)]
>     to
  time bananas apples  1  2   3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
1    1      27     90 98  2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2    2      37     94 74 72  50 52  8 57 61 18 56 53 90  7 85 65 20 76 39 12
3    3      57     65 36 11  49 21  4 53 24 75 33  8 45 34 86 75 89 73 11 85
4    4      89     62 44 45  18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5    5      20      6 15 65  76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79

Basically, we update columns paste0(1:18) of to from columns paste0(1:18) of from, matching up the times.

data.tables apparently have some advantages, such as not needing head when printing them at the console, so I’m thinking about using them.

However I’d like not to have to write the := expressions by hand, ie try to avoid:

to[from,`1`:=i.`1`,`2`:=i.`2`, ..]

I’d also prefer to use vectorized syntax if possible, rather than some kind of for loop, ie try to avoid something like:

for( i in 1:18 ) {
    to[from, sprintf("%d",i) := i.sprintf("%d",i)]
}

I read through the faq vignette, and the datatable-intro vignette, though I admit I probably haven’t understood everything 100%.

I looked at Loop through columns in a data.table and transform those columns , but I can’t say I understand it 100%, and it seems to say that I need to use a for loop?

There does seem to be some kind of a hint at the bottom of 8374816 that it might be possible to just use data frame syntax, adding with=FALSE? But since the data.frame procedure is hacking on the row names, I’m not sure how well / if that will work, and I wonder to what extent that makes use of the efficiencies of data.table?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T11:17:41+00:00

Good question. The base construct you’ve shown :

to[as.character(from$time),paste0(1:18)] <- from[,paste0(1:18)]

works assuming row names can’t be duplicated, or if they are then only the first is matched to. Here, the LHS of <- has the same number of rows as the RHS of <-.

data.table is different since routinely, multiple rows in to may match; the default for mult is "all". data.table also prefers long format to wide. So this question is kind of putting data.table through its paces for something it wasn’t really designed for. If you have any NA in those 18 columns (i.e. sparse), then a long format may be more appropriate. If all 18 columns are the same type, then a matrix may be more appropriate.

That said, here are three data.table options for completeness.

1. Using := but without a for loop (multiple LHS and multiple RHS in LHS:=RHS)

from = as.data.table(from)
to = as.data.table(to)
from
   time blah foo  1  2   3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
1:    1   66  22 98  2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2:    2   35  13 74 72  50 52  8 57 61 18 56 53 90  7 85 65 20 76 39 12
3:    3   27  47 36 11  49 21  4 53 24 75 33  8 45 34 86 75 89 73 11 85
4:    4   97  90 44 45  18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5:    5   61  58 15 65  76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
to
   time bananas apples  1   2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
1:    1      27     90 21  50 94 39 49 67 83 79 48 10 92 26 34 90 44 21 24 80
2:    2      37     94 18  72 22  2 60 80 65  3 87 32 30 48 84 87 72 72  6 46
3:    3      57     65 69 100 66 39 50 11 79 48 44 52 46 77 35 39 40 13 65 42
4:    4      89     62 39  39 13 87 19 73 56 74 25 67 34  9 34 78 33 25 88 82
5:    5      20      6 77  78 27 35 83 42 53 70  8 41 66 88 48 97 76 15 78 61
setkey(to,time)
setkey(from,time)
to[from,paste0(1:18):=from[.GRP,paste0(1:18),with=FALSE]]
   time bananas apples  1  2   3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
1:    1      27     90 98  2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2:    2      37     94 74 72  50 52  8 57 61 18 56 53 90  7 85 65 20 76 39 12
3:    3      57     65 36 11  49 21  4 53 24 75 33  8 45 34 86 75 89 73 11 85
4:    4      89     62 44 45  18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5:    5      20      6 15 65  76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79

or

to[from,paste0(1:18):=from[,paste0(1:18),with=FALSE],mult="first"]
   time bananas apples  1  2   3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
1:    1      27     90 98  2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2:    2      37     94 74 72  50 52  8 57 61 18 56 53 90  7 85 65 20 76 39 12
3:    3      57     65 36 11  49 21  4 53 24 75 33  8 45 34 86 75 89 73 11 85
4:    4      89     62 44 45  18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5:    5      20      6 15 65  76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79

Note I’m using latest v1.8.3, which is needed for option 1 to work (.GRP has just been added, and the outer with=FALSE is no longer needed).

2. Use one list column to store the length 18 vectors, rather than 18 columns

to = data.table( time=seq(1:5),
                 bananas=sample(100,5),
                 apples=sample(100,5),  
                 v18=replicate(5,sample(100,18),simplify=FALSE))
from =  data.table( time=seq(1:5),
                    blah=sample(100,5),
                    foo=sample(100,5),
                    v18=replicate(5,sample(100,18),simplify=FALSE))
setkey(to,time)
setkey(from,time)

from
   time blah foo                 v18
1:    1   56  97   88,47,1,71,69,18,
2:    2   69  40   96,99,60,3,33,27,
3:    3   65  84 100,38,56,72,84,55,
4:    4   98  74 91,69,24,63,27,100,
5:    5   46  52    65,4,59,41,8,51,

to
   time bananas apples                 v18
1:    1      66     73 100,36,74,77,68,46,
2:    2      19     37   84,88,92,8,37,52,
3:    3      94     77   37,94,13,7,93,43,
4:    4      88      2  27,93,71,16,46,66,
5:    5      91     91   85,94,58,49,19,1,

to[from,v18:=i.v18]
to
   time bananas apples                 v18
1:    1      66     73   88,47,1,71,69,18,
2:    2      19     37   96,99,60,3,33,27,
3:    3      94     77 100,38,56,72,84,55,
4:    4      88      2 91,69,24,63,27,100,
5:    5      91     91    65,4,59,41,8,51,

If you are not used to list column printing, the trailing comma signifies that more items are in that vector. Just the first 6 are printed.

3. Use data.frame syntax on the data.table

to = as.data.table(to)
from = as.data.table(from)
setkey(to,time)
setkey(from,time)

from
   time blah foo  1  2   3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
1:    1   66  22 98  2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2:    2   35  13 74 72  50 52  8 57 61 18 56 53 90  7 85 65 20 76 39 12
3:    3   27  47 36 11  49 21  4 53 24 75 33  8 45 34 86 75 89 73 11 85
4:    4   97  90 44 45  18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5:    5   61  58 15 65  76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79

to
   time bananas apples  1   2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
1:    1      27     90 21  50 94 39 49 67 83 79 48 10 92 26 34 90 44 21 24 80
2:    2      37     94 18  72 22  2 60 80 65  3 87 32 30 48 84 87 72 72  6 46
3:    3      57     65 69 100 66 39 50 11 79 48 44 52 46 77 35 39 40 13 65 42
4:    4      89     62 39  39 13 87 19 73 56 74 25 67 34  9 34 78 33 25 88 82
5:    5      20      6 77  78 27 35 83 42 53 70  8 41 66 88 48 97 76 15 78 61

to[from, paste0(1:18)] <- from[,paste0(1:18),with=FALSE]
to
   time bananas apples  1  2   3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
1:    1      27     90 98  2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2:    2      37     94 74 72  50 52  8 57 61 18 56 53 90  7 85 65 20 76 39 12
3:    3      57     65 36 11  49 21  4 53 24 75 33  8 45 34 86 75 89 73 11 85
4:    4      89     62 44 45  18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5:    5      20      6 15 65  76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79

So the LHS of <- can use data.table keyed join syntax; i.e. to[from]. It’s just that this method (currently in R) will copy the entire to dataset. That’s what := was introduced to avoid by providing update by reference. Also, if each row in from matches to multiple rows in to then the RHS of <- would need to expanded to line up (by you the user), otherwise the RHS would be recycled to fill up the LHS. That’s one reason why, in data.table, we like := being inside j, all inside [...].

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have the following code using data.frames, and I’m wondering how to write this

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply