I have a dataset with longitudinal data in a person-oriented format, as such: pid

Question

0

Asked: May 26, 20262026-05-26T06:47:14+00:00 2026-05-26T06:47:14+00:00

I have a dataset with longitudinal data in a person-oriented format, as such: pid

0

I have a dataset with longitudinal data in a person-oriented format, as such:

pid varA_1 varB_1 varA_2 varB_2 varA_3 varB_3 ...
1   1      1      0      3      2      1
2   0      1      0      2      2      1
...
50k 1      0      1      3      1      0

This results in a large dataframe, with minimum 50k observations and 90 variables measured for up to 29 periods.

I would like to get a more period-oriented format, as such:

pid index start stop varA varB varC ...
1   1     ...
1   2     
...
1   29
2   1

I have tried different approaches for reshaping the dataframe (*apply, plyr, reshape2, loops, appending vs. prefilling all numeric matrices, etc.,), but do not seem to get a decent processing time (+40min for subsets). I have picked up various hints along the way on what to avoid, but I’m still not sure if I miss some bottleneck or possible speedup.

Is there an optimal way to approach this kind of data-processing, so that I can evaluate the best-case processing time I can achieve in pure R-code? There have been similar questions on Stackoverflow, but they did not result in convincing answers…

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T06:47:14+00:00

First, let’s build the data example (I am using 5e3 instead of 50e3 to avoid memory problems with my configuration):

nObs <- 5e3
nVar <- 90
nPeriods <- 29

dat <- matrix(rnorm(nObs*nVar*nPeriods), nrow=nObs, ncol=nVar*nPeriods)

df <- data.frame(id=seq_len(nObs), dat)

nmsV <- paste('Var', seq_len(nVar), sep='')
nmsPeriods <- paste('T', seq_len(nPeriods), sep='')

nms <- c(outer(nmsV, nmsPeriods, paste, sep='_'))
names(df)[-1] <- nms

And now with stats::reshape you change the format:

df2 <- reshape(df, dir = "long", varying = 2:((nVar*nPeriods)+1), sep = "_")

I am not sure if this is the fast solution you are looking for.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a dataset with longitudinal data in a person-oriented format, as such: pid

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply