For a dataframe with three columns —
$x— at http://pastebin.com/SGrRUJcA$y— at http://pastebin.com/fhn7A1rj$z— at http://pastebin.com/VmVvdHEE
— I have the following code to generate a dataframe that can be used to plot a stacked bar plot:
counted <- data.frame(table(myDf$x),variable='x')
counted <- rbind(counted,data.frame(table(myDf$y),variable='y'))
counted <- rbind(counted,data.frame(table(myDf$z),variable='z'))
If I then try to sort the dataframe by its Var1 column, as so —
counted.sort <- sort_df(counted,vars="Var1")
— I get a dataframe in which column Var1 now has levels in the following order:
"1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", 19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29",
"30", "31", "32", "33", "34", "35", 36", 37", "39", "42", "46", "47", "53", "54", "38",
"40", "41", "43", "44", "45", "48", "49", "50"
The result distorts the x-axis of my bar plot (…,”54″, “38”, “40”, “41”, “43”,…).
How can I get counted.sort to be sorted correctly by ascending number?
NB. Plotting is done as below:
stackedBp <- ggplot(counted,aes(x=Var1,y=Freq,fill=variable))
stackedBp <- stackedBp+geom_bar(stat='identity')+scale_x_discrete('Levels')+scale_y_continuous('Frequency')
stackedBp
As mentioned in a comment, R’s
?factoris confusing you.Let’s start with
table. This command is for cross-classifying observations into different, well, categories. Implicit here is that the resulting categories are most often going to be categorical rather than numeric.This is why when you pass the table to
data.framethe table categories are converted to a factor. Inspect the structure ofcounted:There’s is an important distinction between the levels and the labels of factors. Internally, factors are always stored as integer codes, and these codes may not correspond to your labels. The internal codes are totally arbitrary, just for bookkeeping.
The safe way to proceed is to do two coercions:
and then explicitly convert it back to a factor in your
ggplot2code:Also, you should keep in mind that sorting the data frame does nothing to change the underlying order associated with the factors in that data frame. To alter that behavior you actually need to use the
orderedargument tofactorwhen specifying the levels directly. Sorting the data frame simply changes the order in which the values are stored, not how the levels are ordered.