I’ve written a script based on a for-loop to read in columns of multiple .xls files, combine them to a single data frame, search for negative values and write a .txt file with these values and the name of the file.
The script works basically, but I have several hundred files to process, and it’s a bit slow. This version of the script is only a basic framework for later statistical analysis, and I want to parallelize the execution to speed it up.
I’ve tried to avoid the for-loop by applying the function via lapply and the plyr-package, but had problems passing the file list to “readWorkSheetFromFile” (Error in path.expand (filename) : invalid ‘path’ argument).
Here is the working script:
require(XLConnect)
setwd(choose.dir())
input = list.files(pattern = ".xls$")
# creates empty data frame
df = data.frame(Name=NULL, PCr=NULL, bATP=NULL, Pi=NULL)
for(i in seq(along=input)){
data = data.frame(readWorksheetFromFile(input[i], sheet="Output Data",
startRow=2, startCol=c(10, 13, 16), endCol=c(10, 13, 16), header=TRUE))
head(data, n = -1L)
colnames(data) = c("PCr", "bATP", "Pi")
data$Name = file.path(input[i])
attach(data)
df = rbind(data, df)
attach(df)
rm(data)
}
# searches for negative values in df and writes to txt file
neg_val = subset(df, bATP<0 | Pi<0 | PCr<0)
write.table(neg_val, file = "neg_val.txt", sep = "\t", quote=F)
Any solutions to this problem, or other suggestions to speed up execution?
Thanks,
Markus
I still don’t know why Martins code is not working on my data, but I’ve found another solution. It was about 4x faster in a first test than my original approach.
Thanks to all and best regards,
Markus