I have a function, readnorm which returns a list of related data from a file identified by an integer:
readnorm <- function(n) {
a <- read.csv(paste("/tmp/diff-a-", n, ".txt", sep=""),
col.names=c("raw"), header=FALSE)
a <- list(n=n, raw=a$raw, median=median(a$raw), iqr=IQR(a$raw))
a$shifted <- a$raw - a$median
a$scaled <- a$raw / a$iqr
a$normed <- a$shifted / a$iqr
a$necdf <- ecdf(a$normed)
return(a)
}
I can build a list containing data from a set of files by using lapply:
> ns = c(5,6,7,8,9,10,15,20,25,30)
> data <- lapply(ns, readnorm)
> ls(data[[1]])
[1] "iqr" "median" "n" "necdf" "normed" "raw" "scaled"
[8] "shifted"
Now, what I would like to do is construct from that a set of data frames, called normed, scaled, etc, which group the entries from the components in data (the names could be the values of n if integer names are allowed in R, so normed$5 contains data[[5]]$normed, etc).
Does that make sense? This way I can plot all the raw data by using the raw data frame, for example. It’s kind-of turning the data structure I have “inside out”.
I am new to R so may be doing something very wrong. In higher-level terms, I believe that the data in the different files are from similar distributions, shifted and scaled, and I want to explore that hypothesis. The code above is my attempt to arrange things so that I can do so in a systematic manner.
So my main question is how to generate the data frames, but I am also interested in more general guidance about how to tackle this problem (how to manage the data – I know about tools like qqplot that will help with the analysis itself).
I agree with the comment that you will be happier using
lapplyrather thansapply.sapplyis doing some simplifying that is actually complicatifying things for you.More generally, if it were me, I’d do less computation in my function that reads the data, and save the processing for later, once the raw data have been placed in a single structure. For instance:
I’m not sure what you plan on doing with the output of
ecdf, so I’ll just note thatecdf()returns a function (just in case you didn’t realize that).Finally, see
?make.namesfor a description of what’s allowed for names.