It is a truth universally acknowledged that R’s base reshape command is speedy and

Question

0

Asked: June 2, 20262026-06-02T07:00:51+00:00 2026-06-02T07:00:51+00:00

It is a truth universally acknowledged that R’s base reshape command is speedy and

0

It is a truth universally acknowledged that R’s base reshape command is speedy and powerful but has miserable syntax. I have therefore written a quick wrapper around it which I will throw into the next release of the taRifx package. Before I did that, however, I want to solicit improvements.

Here’s my version, with updates from @RichieCotton:

# reshapeasy: Version of reshape with way, way better syntax
 # Written with the help of the StackOverflow R community
 # x is a data.frame to be reshaped
 # direction is "wide" or "long"
 # vars are the names of the (stubs of) the variables to be reshaped (if omitted, defaults to everything not in id or vary)
 # id are the names of the variables that identify unique observations
 # vary is the variable that varies.  Going to wide this variable will cease to exist.  Going to long it will be created.
 # omit is a vector of characters which are to be omitted if found at the end of variable names (e.g. price_1 becomes price in long)
 # ... are options to be passed to stats::reshape
reshapeasy <- function( data, direction, id=(sapply(data,is.factor) | sapply(data,is.character)), vary=sapply(data,is.numeric), omit=c("_","."), vars=NULL, ... ) {
  if(direction=="wide") data <- stats::reshape( data=data, direction=direction, idvar=id, timevar=vary, ... )
  if(direction=="long") {
    varying <- which(!(colnames(data) %in% id))
    data <- stats::reshape( data=data, direction=direction, idvar=id, varying=varying, timevar=vary, ... )
  }
  colnames(data) <- gsub( paste("[",paste(omit,collapse="",sep=""),"]$",sep=""), "", colnames(data) )
  return(data)
}

Note that you can move from wide to long without changing the options other than the direction. To me, this is the key to usability.

I’m happy to give acknowledgement in the function help files for any substantial improvements if you chat or e-mail me your info.

Improvements might fall in the following areas:

Naming the function and its arguments
Making it more general (currently it handles a fairly specific case, which I believe to be by far the most common, but it has not yet exhausted the capabilities of stats::reshape)
Code improvements

Examples

Sample data

x.wide <- structure(list(surveyNum = 1:6, pio_1 = structure(c(2L, 2L, 1L, 
2L, 1L, 1L), .Names = c("1", "2", "3", "4", "5", "6"), .Label = c("1", 
"2"), class = "factor"), pio_2 = structure(c(2L, 1L, 2L, 1L, 
2L, 2L), .Names = c("1", "2", "3", "4", "5", "6"), .Label = c("1", 
"2"), class = "factor"), pio_3 = structure(c(2L, 2L, 1L, 1L, 
2L, 1L), .Names = c("1", "2", "3", "4", "5", "6"), .Label = c("1", 
"2"), class = "factor"), caremgmt_1 = structure(c(2L, 1L, 1L, 
2L, 1L, 2L), .Names = c("1", "2", "3", "4", "5", "6"), .Label = c("1", 
"2"), class = "factor"), caremgmt_2 = structure(c(1L, 2L, 2L, 
2L, 2L, 1L), .Names = c("1", "2", "3", "4", "5", "6"), .Label = c("1", 
"2"), class = "factor"), caremgmt_3 = structure(c(1L, 2L, 1L, 
2L, 1L, 1L), .Names = c("1", "2", "3", "4", "5", "6"), .Label = c("1", 
"2"), class = "factor"), prev_1 = structure(c(1L, 2L, 2L, 1L, 
1L, 2L), .Names = c("1", "2", "3", "4", "5", "6"), .Label = c("1", 
"2"), class = "factor"), prev_2 = structure(c(2L, 2L, 1L, 2L, 
1L, 1L), .Names = c("1", "2", "3", "4", "5", "6"), .Label = c("1", 
"2"), class = "factor"), prev_3 = structure(c(2L, 1L, 2L, 2L, 
1L, 1L), .Names = c("1", "2", "3", "4", "5", "6"), .Label = c("1", 
"2"), class = "factor"), price_1 = structure(c(2L, 1L, 2L, 5L, 
3L, 4L), .Names = c("1", "2", "3", "4", "5", "6"), .Label = c("1", 
"2", "3", "4", "5", "6"), class = "factor"), price_2 = structure(c(6L, 
5L, 5L, 4L, 4L, 2L), .Names = c("1", "2", "3", "4", "5", "6"), .Label = c("1", 
"2", "3", "4", "5", "6"), class = "factor"), price_3 = structure(c(3L, 
5L, 2L, 5L, 4L, 5L), .Names = c("1", "2", "3", "4", "5", "6"), .Label = c("1", 
"2", "3", "4", "5", "6"), class = "factor")), .Names = c("surveyNum", 
"pio_1", "pio_2", "pio_3", "caremgmt_1", "caremgmt_2", "caremgmt_3", 
"prev_1", "prev_2", "prev_3", "price_1", "price_2", "price_3"
), idvars = "surveyNum", rdimnames = list(structure(list(surveyNum = 1:24), .Names = "surveyNum", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24"
), class = "data.frame"), structure(list(variable = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("pio", 
"caremgmt", "prev", "price"), class = "factor"), .id = c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L)), .Names = c("variable", 
".id"), row.names = c("pio_1", "pio_2", "pio_3", "caremgmt_1", 
"caremgmt_2", "caremgmt_3", "prev_1", "prev_2", "prev_3", "price_1", 
"price_2", "price_3"), class = "data.frame")), row.names = c(NA, 
6L), class = c("cast_df", "data.frame"))

x.long <- structure(list(.id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), pio = structure(c(2L, 
2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 
1L, 2L, 2L, 1L, 2L, 1L, 1L), .Label = c("1", "2"), class = "factor"), 
    caremgmt = structure(c(2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 
    2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 
    1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
    1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 
    1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 
    1L, 2L, 2L), .Label = c("1", "2"), class = "factor"), prev = structure(c(1L, 
    2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 
    1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 
    2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 
    2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 
    1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L), .Label = c("1", 
    "2"), class = "factor"), price = structure(c(2L, 1L, 2L, 
    5L, 3L, 4L, 1L, 5L, 4L, 3L, 1L, 2L, 6L, 6L, 5L, 4L, 6L, 3L, 
    5L, 6L, 3L, 1L, 2L, 4L, 3L, 5L, 2L, 5L, 4L, 5L, 6L, 6L, 4L, 
    6L, 4L, 1L, 2L, 3L, 1L, 2L, 2L, 5L, 1L, 6L, 1L, 3L, 4L, 3L, 
    6L, 5L, 5L, 4L, 4L, 2L, 2L, 2L, 6L, 3L, 1L, 4L, 4L, 5L, 1L, 
    3L, 6L, 1L, 3L, 5L, 1L, 3L, 6L, 2L), .Label = c("1", "2", 
    "3", "4", "5", "6"), class = "factor"), surveyNum = c(1L, 
    2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 
    15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 1L, 2L, 
    3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 
    16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 1L, 2L, 3L, 
    4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 
    17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L)), .Names = c(".id", 
"pio", "caremgmt", "prev", "price", "surveyNum"), row.names = c(NA, 
-72L), class = "data.frame")

Examples

> x.wide
  surveyNum pio_1 pio_2 pio_3 caremgmt_1 caremgmt_2 caremgmt_3 prev_1 prev_2 prev_3 price_1 price_2 price_3
1         1     2     2     2          2          1          1      1      2      2       2       6       3
2         2     2     1     2          1          2          2      2      2      1       1       5       5
3         3     1     2     1          1          2          1      2      1      2       2       5       2
4         4     2     1     1          2          2          2      1      2      2       5       4       5
5         5     1     2     2          1          2          1      1      1      1       3       4       4
6         6     1     2     1          2          1          1      2      1      1       4       2       5
> reshapeasy( x.wide, "long", NULL, id="surveyNum", vary="id", sep="_" )
    surveyNum id pio caremgmt prev price
1.1         1  1   2        2    1     2
2.1         2  1   2        1    2     1
3.1         3  1   1        1    2     2
4.1         4  1   2        2    1     5
5.1         5  1   1        1    1     3
6.1         6  1   1        2    2     4
1.2         1  2   2        1    2     6
2.2         2  2   1        2    2     5
3.2         3  2   2        2    1     5
4.2         4  2   1        2    2     4
5.2         5  2   2        2    1     4
6.2         6  2   2        1    1     2
1.3         1  3   2        1    2     3
2.3         2  3   2        2    1     5
3.3         3  3   1        1    2     2
4.3         4  3   1        2    2     5
5.3         5  3   2        1    1     4
6.3         6  3   1        1    1     5

> head(x.long)
  .id pio caremgmt prev price surveyNum
1   1   2        2    1     2         1
2   1   2        1    2     1         2
3   1   1        1    2     2         3
4   1   2        2    1     5         4
5   1   1        1    1     3         5
6   1   1        2    2     4         6

> head(reshapeasy( x.long, direction="wide", id="surveyNum", vary=".id" ))
  surveyNum pio.1 caremgmt.1 prev.1 price.1 pio.3 caremgmt.3 prev.3 price.3 pio.2 caremgmt.2 prev.2 price.2
1         1     2          2      1       2     2          1      2       3     2          1      2       6
2         2     2          1      2       1     2          2      1       5     1          2      2       5
3         3     1          1      2       2     1          1      2       2     2          2      1       5
4         4     2          2      1       5     1          2      2       5     1          2      2       4
5         5     1          1      1       3     2          1      1       4     2          2      1       4
6         6     1          2      2       4     1          1      1       5     2          1      1       2

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-02T07:00:53+00:00

I would also like to see an option to order the output, since that’s one of the things I don’t like about reshape in base R. As an example, let’s use the Stata Learning Module: Reshaping data wide to long, which you are already familiar with. The example I’m looking at is the “kids height and weight at age 1 and age 2” example.

Here’s what I normally do with reshape():

# library(foreign)
kidshtwt = read.dta("http://www.ats.ucla.edu/stat/stata/modules/kidshtwt.dta")
kidshtwt.l = reshape(kidshtwt, direction="long", idvar=1:2, 
                     varying=3:6, sep="", timevar="age")
# The reshaped data is correct, just not in the order I want it
# so I always have to do another step like this
kidshtwt.l = kidshtwt.l[order(kidshtwt.l$famid, kidshtwt.l$birth),]

Since this is an annoying step that I always have to go through when reshaping the data, I think it would be useful to add that into your function.

I also suggest at least having an option for doing the same thing with the final column order for reshaping from long to wide.

Example function for column ordering

I’m not sure of the best way to integrate this into your function, but I put this together to sort a data frame based on basic patterns for the variable names.

col.name.sort = function(data, patterns) {
  a = names(data)
  b = length(patterns)

  subs = vector("list", b)

  for (i in 1:b) {
    subs[[i]] = sort(grep(patterns[i], a, value=T))
    }
  x = unlist(subs)
  data[ , x ]
}

It can be used in the following manner. Imagine we had saved the output of your reshapeasy long to wide example as a data frame named a, and we wanted it ordered by “surveyNum”, “caremgmt” (1-3), “prev” (1-3), “pio” (1-3), and “price” (1-3), we could use:

col.name.sort(a, c("sur", "car", "pre", "pio", "pri"))

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

It is a truth universally acknowledged that R’s base reshape command is speedy and

Leave an answerCancel reply

1 Answer

Example function for column ordering

Leave an answer
Cancel reply