I want to pivot the result column in df horizontally creating a data set with a separate row for each
region, state, county combination where the columns are ordered by year then city.
I also want to identify each row in the new data set by region, state and county and remove the white space between the four results columns. The code below does all of that, but I suspect it is not very efficient.
Is there a way to do this with reshape2 without creating a unique identifier for each group and numbering observations within each group? Is there a way to use apply in place of the for-loop to remove white space from a matrix? (Matrix here being used in a different manner than a mathematical or programming construct.) I realize those are two separate questions and maybe I should post each question separately.
Given that I can achieve the desired result and am only looking to improve the code I do not know whether I should even post this, but I am hoping to learn. Thanks for any advice.
df <- read.table(text= "
region state county city year result
1 1 1 1 1 1
1 1 1 2 1 2
1 1 1 1 2 3
1 1 1 2 2 4
1 1 2 3 1 4
1 1 2 4 1 3
1 1 2 3 2 2
1 1 2 4 2 1
1 2 1 1 1 0
1 2 1 2 1 NA
1 2 1 1 2 0
1 2 1 2 2 0
1 2 2 3 1 2
1 2 2 4 1 2
1 2 2 3 2 2
1 2 2 4 2 2
2 1 1 1 1 9
2 1 1 2 1 9
2 1 1 1 2 8
2 1 1 2 2 8
2 1 2 3 1 1
2 1 2 4 1 0
2 1 2 3 2 1
2 1 2 4 2 0
2 2 1 1 1 2
2 2 1 2 1 4
2 2 1 1 2 6
2 2 1 2 2 8
2 2 2 3 1 3
2 2 2 4 1 3
2 2 2 3 2 2
2 2 2 4 2 2
", header=TRUE, na.strings=NA)
desired.result <- read.table(text= "
region state county results
1 1 1 1234
1 1 2 4321
1 2 1 0.00
1 2 2 2222
2 1 1 9988
2 1 2 1010
2 2 1 2468
2 2 2 3322
", header=TRUE, colClasses=c('numeric','numeric','numeric','character'))
# redefine variables for package reshape2 creating a unique id for each
# region, state, county combination and then number observations in
# each of those combinations
library(reshape2)
id.var <- df$region*100000 + df$state*1000 + df$county
obsnum <- sequence(rle(id.var)$lengths)
df2 <- dcast(df, region + state + county ~ obsnum, value.var = "result")
# remove spaces between columns of results matrix
# with a for-loop. How can I use apply to do this?
x <- df2[,4:(4+max(obsnum)-1)]
# use a dot to represent a missing observation
x[is.na(x)] = '.'
x.cat = numeric(nrow(x))
for(i in 1:nrow(x)) {
x.cat[i] = paste(x[i,], collapse="")
}
df3 <- cbind(df2[,1:3],x.cat)
colnames(df3) <- c("region", "state", "county", "results")
df3
df3 == desired.result
EDIT:
Matthew Lundberg’s answer below is excellent. Afterwards I realized I also needed to create an output data set in which the four result columns above contain numeric, rational numbers and are separated by a space. So, I have posted an apparent way to do that below that modifies Matthew’s answer. I do not know whether this is accepted protocol, but the new scenario seems so immediately related to the original post that I did not think I should post a new question.
I think this does what you want:
This relies on your data frame being sorted in the proper order (as yours is).