I want to do some statistic analysis in R on a column with the same name but different length originating from several data frames. I created a list:
my.list <- list(df1, df2, df3, df4)
Now, as some elements of the column of interest (say: my.col) contain the word “FAILED” instead of numbers, I replace it by ‘NA’:
for (i in 1:length(my.list)){
for (j in 1:length(my.list[[i]]$my.col)){
if (my.list[[i]]$my.col[j] %in% c("FAILED"))
{my.list[[i]]$my.col[j] <- 'NA'};
}
}
I am pretty sure that this is not the best solution for the problem, but at least it works. Although I have to say that it causes warnings that in another column (not my.col) there are unvalid factor levels that have been replaced by ‘NA’. No idea why it actually considers other columns than my.col. Suggestions for improvement are highly appreciated.
Now, the remaining numbers contain decimal comma instead of point. Although I tried to eliminate this problems while importing the .csv-file with “dec=”,””, this does not work out for columns that contain anything else than numbers (e.g. “FAILED”). So I have to replace the comma by the point, and this is what doesn’t work for me. I tried:
for (i in 1:length(my.list)){
as.numeric(gsub(",", ".", my.list[[i]]$my.col))
}
This doesn’t give any errors, but it also doesn’t change anything, although if I type in e.g.
as.numeric(gsub(",", ".", my.list[[4]]$my.col))
it does what I want to do for the 4th element of the list. From my point of view, both should be the same. What’s the problem with this?
Btw, I prefer not to delete the other columns from the data frames because I might need them in future for other analysis.
You can do this efficiently using the plyr package.
Note that in the example, I use the built in iris data.
Instead of replacing “FAILED” with NA, I replaced values of “versicolor”.
Instead of replacing a coma with a period, I replaced an s with a w.
The as.character was added as an example of a way to circumvent problems with adding a level to a factor. The as.factor insure the column is return as a factor with the new levels.
This will also give you the flexibility to convert from list to data.frame. Simply replace llply with ldply.