I have monthly data in one data.table and annual data in another data.table and

Question

0

Asked: May 26, 20262026-05-26T14:48:13+00:00 2026-05-26T14:48:13+00:00

I have monthly data in one data.table and annual data in another data.table and

0

I have monthly data in one data.table and annual data in another data.table and now I want to match the annual data to the respective observation in the monthly data.

My approach is as follows: Duplicating the annual data for every month and then join the monthly and annual data. And now I have a question regarding the duplication of rows. I know how to do it, but I’m not sure if it is the best way to do it, so some opinions would be great.

Here is an exemplatory data.table DT for my annual data and how I currently duplicate:

library(data.table)
DT <- data.table(ID = paste(rep(c("a", "b"), each=3), c(1:3, 1:3), sep="_"),
                    values = 10:15,
                    startMonth = seq(from=1, by=2, length=6),
                    endMonth = seq(from=3, by=3, length=6))
DT
      ID values startMonth endMonth
[1,] a_1     10          1        3
[2,] a_2     11          3        6
[3,] a_3     12          5        9
[4,] b_1     13          7       12
[5,] b_2     14          9       15
[6,] b_3     15         11       18
#1. Alternative
DT1 <- DT[, list(MONTH=startMonth:endMonth), by="ID"]
setkey(DT,  ID)
setkey(DT1, ID)
DT1[DT]
ID MONTH values startMonth endMonth
a_1     1     10          1        3
a_1     2     10          1        3
a_1     3     10          1        3
a_2     3     11          3        6
[...]

The last join is exactly what I want. However, DT[, list(MONTH=startMonth:endMonth), by="ID"] already does everything I want except adding the other columns to DT, so I was wondering if I could get rid of the last three rows in my code, i.e. the setkey and join operations. It turns out, you can, just do the following:

#2. Alternative: More intuitiv and just one line of code
DT[, list(MONTH=startMonth:endMonth, values, startMonth, endMonth), by="ID"]
 ID MONTH values startMonth endMonth
a_1    1     10          1        3
a_1    2     10          1        3
a_1    3     10          1        3
a_2    3     11          3        6
...

This, however, only works because I hardcoded the column names into the list expression. In my real data, I do not know the names of all columns in advance, so I was wondering if I could just tell data.table to return the column MONTH that I compute as shown above and all the other columns of DT. .SD seemed to be able to do the trick, but:

DT[, list(MONTH=startMonth:endMonth, .SD), by="ID"]
Error in `[.data.table`(DT, , list(YEAR = startMonth:endMonth, .SD), by = "ID") : 
  maxn (4) is not exact multiple of this j column's length (3)

So to summarize, I know how it’s been done, but I was just wondering if this is the best way to do it because I’m still struggling a little bit with the syntax of data.table and often read in posts and on the wiki that there are good and bads ways of doing things. Also, I don’t quite get why I get an error when using .SD. I thought it is just any easy way to tell data.table that you want all columns. What do I miss?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T14:48:14+00:00

Great question. What you tried was very reasonable. Assuming you’re using v1.7.1 it’s now easier to make list columns. In this case it’s trying to make one list column out of .SD (3 items) alongside the MONTH column of the 2nd group (4 items). I’ll raise it as a bug [EDIT: now fixed in v1.7.5], thanks.

In the meantime, try :

DT[, cbind(MONTH=startMonth:endMonth, .SD), by="ID"]
 ID MONTH values startMonth endMonth
a_1     1     10          1        3
a_1     2     10          1        3
a_1     3     10          1        3
a_2     3     11          3        6
...

Also, just to check you’ve seen roll=TRUE? Typically you’d have just one startMonth column (irregular with gaps) and then just roll join to it. Your example data has overlapping month ranges though, so that complicates it.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have monthly data in one data.table and annual data in another data.table and

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply