table with data (its a data.table object) that looks like the following :
date stock_id logret
1: 2011-01-01 1 0.001
2: 2011-01-02 1 0.003
3: 2011-01-03 1 0.005
4: 2011-01-04 1 0.007
5: 2011-01-05 1 0.009
6: 2011-01-06 1 0.011
7: 2011-01-01 2 0.013
8: 2011-01-02 2 0.015
9: 2011-01-03 2 0.017
10: 2011-01-04 2 0.019
11: 2011-01-05 2 0.021
12: 2011-01-06 2 0.023
13: 2011-01-01 3 0.025
14: 2011-01-02 3 0.027
15: 2011-01-03 3 0.029
16: 2011-01-04 3 0.031
17: 2011-01-05 3 0.033
18: 2011-01-06 3 0.035
The above can be created as :
DT = data.table(
date=rep(as.Date('2011-01-01')+0:5,3) ,
stock_id=c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3),
logret=seq(0.001, by=0.002, len=18));
setkeyv(DT,c('stock_id','date'))
Of course the real table is larger with many more stock_ids and dates. The aim to to reshape this data table such that I can run a regression of all stockid log_returns with their corresponding log_returns with a lag of 1 day (or prior traded day in case of weekends).
The final results would look like :
date stock_id logret lagret
1: 2011-01-01 1 0.001 NA
2: 2011-01-02 1 0.003 0.001
3: 2011-01-03 1 0.005 0.003
....
16: 2011-01-04 3 0.031 0.029
17: 2011-01-05 3 0.033 0.031
18: 2011-01-06 3 0.035 0.033
I’m finding this data structure really tricky to build without mixing up my stockid.
Just some additional notes due to Alex’s comment. The reason you have difficulties understanding what’s going on here is that a lot of things are done within one line. So it’s always a good idea to break things down.
What do we actually want? We want a new column
lagretand the syntax to add a new column in data.table is the following:where
xxxhas to be filled up with whatever you want to have in columnlagret. So if we just want a new column that gives us the rows, we could just callHere, we actually want the lagged value of
logret, but we have to consider that there are many stocks in here. That’s why we do a self-join, i.e. we join the data.tableDTwith itself by the columnsstock_idanddate, but since we want the previous value of each stock, we usedate-1. Note that we have to set the keys first to do such a join:As you can see, we now have what we want.
logretis now lagged by one period. But we actually want that in a new columnlagretinDT, so we just get that column by calling [[3L]] (this means nothing else then get me the third column) and name this new columnlagret:This is already the correct solution. In this simple case, we do not need
roll=TRUEbecause there are no gaps in the dates. However, in a more realistic example (as mentioned above, for instance when we have weekends), there might be gaps. So let’s make such a realistic example by just deleting two days in theDTfor the first stock:As you can see, the problem is now that we don’t have a value for the 6th of January. That’s why we use
roll=TRUE:Just have a look on the documentation on how
roll=TRUEworks exactly. In a nutshell: If it can’t find the previous value (herelogretfor the 5th of January), it just takes the last available one (here from the 3rd of January).