My dataframe looks like this:
ID | value A | value B
1 | A1 | F
1 | A2 | N
1 | A3 | B
1 | A4 | S
2 | A1 | B
2 | A2 | G
2 | A3 | N
3 | A1 | F
3 | A2 | H
3 | A3 | J
3 | A4 | N
So I have 4 rows for one ID each. I am trying to use the dcast() function, but it only works if all IDs have the same number of rows. ID No. 2 would be an error case in this example. Is there any easy way to find all IDs that have more or less than 4 rows?
Or may be is there any way to make the dcast function ignore the error cases?
Originally I am trying to reshape the dataframe to get something like this:
ID | A1 | A2 | A3 | A4
1 | F | N | B | S
2 | B | G | N | NA
3 | F | H | J | N
Apparently the dcast() function from the reshape2 package doesn´t work with irregular IDs. It gives me the following erros message: ‘Aggregation function missing: defaulting to length’ But with a smaller part of my dataset – which doesn´t have those irregular iDs – it works. Any ideas?
Or may be an idea how to reshape my dataframe without using dcast? Thanks!
I am working on a mac with the following (package-) versions:
sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reshape2_1.2.1 plyr_1.7.1
loaded via a namespace (and not attached):
[1] stringr_0.6
The first column values are all integer, the others character values.
sapply(x, class)
ID fach01 f01_lp
"integer" "character" "character"
As for the reproducible example:
I hope this helps (I used my original dataframe), however if I only use the first 500 rows of the dataframe dcast() works perfectly fine, the problem occurs when I try to use the whole dataframe of about 140000 rows.
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L,
7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L), A = c("2.LF",
"1.LF", "3.PF", "4.PF", "3.PF", "1.LF", "2.LF", "3.PF",
"4.PF", "1.LF", "2.LF", "3.PF", "1.LF", "4.PF", "2.LF", "1.LF",
"2.LF", "4.PF", "3.PF", "1.LF", "3.PF", "2.LF", "4.PF", "3.PF",
"4.PF", "1.LF", "2.LF", "4.PF", "2.LF", "3.PF", "1.LF", "1.LF",
"2.LF", "3.PF", "4.PF"), B = c("Mu/Ku",
"Fs", "2.AF", "NW", "DE", "2.AF", "MA", "Fs", "2.AF", "NW",
"NW", "Fs", "2.AF", "bel", "NW", "Fs", "bel", "bel", "NW", "DE",
"2.AF", "2.AF", "MA", "Fs", "2.AF", "MA", "NW", "DE", "2.AF",
"MA", "NW", "Mu/Ku", "Fs", "2.AF", "NW")), .Names = c("ID", "A", "B"
), row.names = c("3", "5", "7", "10", "26", "29", "212", "213",
"32", "35", "38", "39", "43", "44", "45", "48", "53", "56", "57",
"59", "61", "65", "67", "68", "72", "75", "76", "77", "81", "86",
"87", "88", "92", "93", "95", "98"), class = "data.frame")
In my original dataframe the values A1 -A4 (here called 1.PF – 4.PF) are not in the right order, this is what I want dcast to do (same as above)
ID | 1.PF | 2.PF | 3.PF | 4.PF
1 | F | NW | DE | S
2 | bel | G | N | <NA>
3 | F | NW | bel | N
EDIT:
I didn´t solve the dcast() problem, but I found a way to work around it: (reshape() function from the reshape package)
df <- reshape(df, idvar='ID', varying = NULL, timevar = 'value A', direction='wide')
tableandwhichwould certainly be the answer to the first question:As for the second question, maybe you should post the code that is generating the error. At the moment it’s not clear what you are trying (and failing) to do.
EDIT:
If I convert the factor variables to character variables I can get dcast to return the correct object, although my error is different than yours. I got the error in both reshape 1.1 and reshape 1.2.1 on R 2.14.1 on a Mac.
EDIT2: As it turned out the bug was fixed in the newest version of plyr. I get no error with reshape 1.2.1 running with plyr 1.7. You should also update those two packages and restart with a fresh session.