I need to summarize a data frame by some variables, ignoring the others. This is sometimes referred to as collapsing. E.g. if I have a dataframe like this:
Widget Type Energy
egg 1 20
egg 2 30
jap 3 50
jap 1 60
Then collapsing by Widget, with Energy the dependent variable, Energy~Widget, would yield
Widget Energy
egg 25
jap 55
In Excel the closest functionality might be “Pivot tables” and I’ve worked out how to do it in python ( http://alexholcombe.wordpress.com/2009/01/26/summarizing-data-by-combinations-of-variables-with-python/), and here’s an example with R using doBy library to do something very related ( http://www.mail-archive.com/r-help@r-project.org/msg02643.html), but is there an easy way to do the above? And even better is there anything built into the ggplot2 library to create plots that collapse across some variables?
Use
aggregateto summarize across a factor:For more flexibility look at the
tapplyfunction and theplyrpackage.In
ggplot2usestat_summaryto summarize