A question that is undoubtedly easy to solve for an R expert.
I need to repeat a number of functions on dataframes that are sequentially labeled (before merging them all together). For example, I might need to do the following:
# READ IN DATAFILES & LABEL DF'S
df1 <- read.csv(file="file_A.csv",head=TRUE)
df2 <- read.csv(file="file_B.csv",head=TRUE)
df3 <- read.csv(file="file_C.csv",head=TRUE)
# TURN DF'S INTO DATA TABLES
df1<-data.table(df1)
df2<-data.table(df2)
df3<-data.table(df3)
# CHANGE VARIABLE TO POSIX
df1$date <-as.POSIXct(df1$date, format = "%Y-%m-%d %H:%M:%S")
df2$date <-as.POSIXct(df2$date, format = "%Y-%m-%d %H:%M:%S")
df3$date <-as.POSIXct(df3$date, format = "%Y-%m-%d %H:%M:%S")
# FILTER BY DATE RANGE
date_filter<-as.POSIXct("2012-01-01 01:01:01")
df1<-subset(df1, df1$date>date_filter)
df2<-subset(df2, df2$date>date_filter)
df3<-subset(df3, df3$date>date_filter)
# AGGREGATE OVER A UNIQUE ID
df1<-df1[,(sum(var)), by=list(id)]
df2<-df2[,(sum(var)), by=list(id)]
df2<-df2[,(sum(var)), by=list(id)]
# FINALLY, MERGE TOGETHER
df <-merge(df1,df2, by="id",all=TRUE)
You get the idea–only I need to do this for 25 dataframes, not 3. I have a suspicion that I can make R repeat functions by creating a vector (df_nums<-c(1:25))) and then using a function to loop over all of my data frames, but I don’t know how to do it.
Please help! Thanks!
Edit: Thanks to Arun, I’m up to this for my actual code:
out<- lapply(1:length(files), function(idx) {
df <- as.data.table(read.csv(files[idx], header = TRUE))
df$date <- as.POSIXct(df$date, format = "%Y-%m-%d %H:%M:%S")
date_filter <- as.POSIXct("2012-11-13 01:01:01")
df <- subset(df, df$date > date_filter)
df <- df[, .N, by = list(id)]
})
out<-data.table(out)
out.merge <- Reduce(function(...) merge(..., by="id", all=T), out)
Edit 2: After running the following syntax, I appear to have data.tables nested in out. For example,
> head(out)
out
1: <data.table>
2: <data.table>
3: <data.table>
4: <data.table>
5: <data.table>
6: <data.table>
How do I access these data.tables to see if everything worked correctly?
You can use
list.filesto obtain all the CSV files from the directory and uselapplyto recurse, in this manner:You can use
do.call(rbind, out)ordo.call(cbind, out)to bind all results by row or columns.Edit: After @roody’s question about outer join. Something like this?