The original dataset is like:
sub month y
1 1 1
1 2 2
1 3 3
1 5 5
What I want is to get the previous 3 ys based on month for each subject, if y is missing at that month, then the new variable is . too. Say , for the above example, lag1 is previous y for last month, lag2 is previous y for 2 months ago, the lag3 is so far so forth:
sub month y lag1 lag2 lag3
1 1 1 . . .
1 2 2 1 . .
1 3 3 2 1 .
1 5 5 . 3 2
The thing is I checked out lag and dif function, but in my case, what I want on lag depends on month and there also are gaps between month, so I can not use the previous one like lag1 function.
Also I need to do this for many subjects too. Thanks.
SQL solution:
Obviously this gets a bit long if you want 36 of them, but it’s not all that complex at least. There are all sorts of other ways to do this. Don’t use LAG, that’s going to be a headache and not really appropriate anyway. A hash table might be more efficient and require less coding, if you’re familiar with the concept of a hash.
Hash solution:
This is pretty simple to expand up to 36 [or whatever] – just change the length of the array to
array lags lag1-lag36and the do statementdo prevmonth=month-1 to month-36 by -1;The most work you may have to do is arrange things so month works here – either by creating a integer month, or changing the loop criteria to work with month/year or whatnot. You don’t show how your data is specified so can’t help there.