I am using a subset to extract from set of a column values
I have a DF with several columns, one of them is state. I need to get frequencies for state but just for some set of states. So I have this:
tmp <- subset(DF, DF$STATE %in% SOMESTATES)
a <- as.data.frame(table(tmp$STATE))
This is almost fine. The tmp data frame has only the records belonging to the SOMESTATES set, fine.
The issue is a. The table result is the complete frequencies for the whole DF, not the tmp only. The other has zero values.
My problem here is some states in SOMESTATES has zero appearances in the DF, which this prevents me to use droplevels. droplevels takes out also those zero values. Here as well:
If I use droplevels like this, for example, I loose the zero values for some states that I need.
tmp <- subset(DF, DF$STATE %in% SOMESTATES)
tmp2 <- droplevels(tmp)
table(tmp2$STATE)
presents all the states not only the ones in SOMESTATES
Any advise is appreciated
Try something like:
The factor function will recreate it into a new factor whose levels match
SOMESTATESexactly. Those states who are not inSOMESTATESwill not be included as they are neither in the data norSOMESTATES, but those states with 0 count that are inSOMESTATESwill still be included in the factor, and hence the table with 0 counts.Note also that the factor levels will be ordered by
SOMESTATESso the order in that vector will be the order that shows up in the tables or plots using it. Generally this is benign or useful.