I am trying to solve this in R, although I know I would have done it by now in Excel, I really want to learn how to use R.
My dataframe looks like this:
OBJECTID CDUID SENSI_FINA FREQUENCY SUM_LENGTH
6 5915 1 51 19178
7 5915 2 97 21536
8 5915 3 201 35640
9 5915 4 551 170549
10 5915 5 308 145126
11 5917 1 210 28104
12 5917 2 1897 249379
Now I would like to sum the SUM_LENGTH per CDUID and then calculate the percentage the SUM_LENGTH with SENSI_FINAL=5 is of the summed SUM_LENGTH per CDUID
So in simple words I want to do this:
(145123/(19178+21536+35640+170549))*100
for CDUID = 5915 and then for the next 5917 etc..
What I did so far is I calculated the sum based on the CDUID:
CDlenght <- aggregate(step1$SUM_LENGTH~step1$CDUID, data=step1, sum)
but now I’m stuck…:-(
I would use
data.tableorddplyfor thisI think the
data.tablesyntax is more elegant, and it is more memory efficient:=assigns by reference withinDT(so thepercentcolumn will now be inDT