I would like to merge two data frames, but do not want to duplicate rows if there is more than one match. Instead I would like to sum the observations on that day.
From ?merge: The rows in the two data frames that match on the specified columns are extracted, and joined together. If there is more than one match, all possible matches contribute one row each.
Here’s some example code:
days <- as.data.frame(as.Date(c("2012-1-1", "2012-1-2", "2012-1-3", "2012-1-4")))
names(days) <- "Date"
obs.days <- as.data.frame(as.Date(c("2012-1-2", "2012-1-3", "2012-1-3")))
obs.days$count <- 1
colnames(obs.days) <- c("Date", "Count")
df <- merge(days, obs.days, by.x="Date", by.y="Date", all.x=TRUE)
I would like the final data frame to only list 2012-1-3 one time with a count value of 2.
I’d suggest you merge them and then aggregate them (essentially perform a SUM for each unique
Date).Now to do the merge you could use
aggregate:BUT I’d recommend package
plyr, which is awesome! In particular, functionddply.The command
ddply(df,.(Date),FUN)essentially does:So the function I’ve provided creates a data frame of one row with columns
DateandCount, being the sum of all counts for that date.