So I have a data frame, say with following data:
Count Amount Org Bank
------------------------------------------
1 100 ABC Chase
15 76 DEF American Express
...
...
When I run the ddply using:
result1 <- ddply(df, 4, count = sum(as.numeric(df[[1]])), amt = sum(as.numeric(df[[2]])))
I get the result with result1 having the same value (i.e. count and amt) for all rows i.e.
description count amt
Chase 900 432087
American Express 900 432087
.....
which is definitely not the case. Somehow, it seems like the last sum() value being calculated is applied to all the rows. Am I missing something here?
There are a few problems here:
You are gettting the same/wrong result because you are referring back to the original dataframe
dfin the arguments to ddply – e.g.df[[1]].Ddply doesn’t work like that – use column names directly, e.g.
AmountandCount.You are missing the
.funfunction argument to ddply – in this casesummarizeis appropriate.(I honestly don’t know how your code worked at all without this.)
You are using an undocumented way (
4) to select group columns in the.variableargument. Try.(Bank)orc("Bank")instead.This should work: