Probably, I haven’t defined the problem very well. I don’t seem to understand what R is returning out of sapply. I have a large dataframe of hierarchical data. About half the columns are factors and half are numerical. I want to get a new dataframe that contains some of the factors, and sums over the numerical columns, but I want the sums to remain separated by factor levels.
For instance, from the sample data below, I’d like to make a dataframe with the state, district, branch the same, but sum the data for orders of the same type but with different colours. I’m thinking that iterative use of sapply will do it, but I can’t seem to get it to work.
sample data:
state district branch order colour number cost amount
CA central newtown shoes black 6 25.50 127.40
CA central newtown shoes brown 3 32.12 75.40
CA central newtown gloves blue 15 12.20 157.42
CA central newtown gloves black 9 8.70 65.37
CA central columbus shoes black 12 30.75 316.99
CA central columbus shoes brown 1 40.98 45.00
CA central columbus gloves blue 47 11.78 498.32
CA central columbus gloves black 23 7.60 135.50
Another job for
aggregate. Calling your data framedat:On the left side of the ~,
cbindis used to indicate that we want each column separately. Ifcost + amountwere specified, it would mean the sum here because these are numeric. On the right side of the ~, we have factors, so the + means that we are aggregating by each level of each factor.