I would like to reshape a data frame of dimensions 103740 x 4 into wide format, i.e., to go from the first block of text below to the second. The following function works well enough when applied to part of the data frame but it exhausts my machine’s available memory when applied to the full data frame:
reshape(analysis1[1:8], idvar=c("name", "sire_name"),
timevar = "course", direction = "wide")[, -c(1,2)]
What are some better ways to unstack this data frame?
name sire_name course appearance
1 A Beetoo Bahhare Kempton 0
2 A Beetoo Bahhare Lingfield 9
3 A Beetoo Bahhare Southwell 0
4 A Beetoo Bahhare Wolverhampton 0
5 A Bid In Time Danetime Kempton 0
6 A Bid In Time Danetime Lingfield 3
7 A Bid In Time Danetime Southwell 2
8 A Bid In Time Danetime Wolverhampton 2
...
Kempton Lingfield Southwell Wolverhampton
1 0 9 0 0
5 0 3 2 2
...
I’ve tried moving the data frame into a data table, but just doing this alone doesn’t seem to help.
When you want it very simple like this reshaping in base R would be most efficient. I’d just make a matrix out of appearance, cast it to a data.frame type, and name the columns.
Now, if you don’t have the frame sorted just as you’ve presented it and maybe even have more values in the course column the following is more robust…
(you can add other columns to the order function to break ties if you wish)
One of the great advantages of the many commands that can be used to reshape your data and is that they are robust to the order of the data, missing cells, etc. However, this also makes them generally quite slow when you’ve got a straightforward reshape on a very large data.frame object.