I am trying to do some analysis of the recent MLB draft with some ggplots in R
selection <- draft[c("Team","Division","Position")]
head(selection)
Team Division Position
1 pit NL Central P
2 sea AL West P
3 ari NL West P
4 bal AL East P
5 kc AL Central O
6 was NL East I
where P = Pitcher , O=Outfield etc.
I want to show the number of players selected by team by position within each division
p <- ggplot(data=selection, aes(x=Team, fill= Position)) + geom_bar(position="stack")
p <- p + coord_flip()
p <- p+ ylab("Players Selected")
p <- p + facet_wrap(~Division)
p
This gets me part of the way there but is very unattractive
a) the groupings work but all teams are shown in each divison grid – even though only the 5 or 6 team in each division actually – and correctly – show data
b) With the co-ord flip, the teams are listed in reverse alphabetical order down page. can I resort. It would also be nice to have left justification
c) How do i set the legend to Pitching, Outfield rather than P and O – is this a vector i somehow need to set and include
d) It would also be interesting to see the proportion of each teams selection committed to each type of player. This is accomplished by setting position= “fill”. Can i set the axes to % rather than 0 to 1. I also tried setting a geom_vline(aes(xintercept=0.5) -and yintercept in case the flip factored in –
but the line did not appear at halfway mark along the x axis
Help much appreciated
edit: complete revamping, including info from other answer, after grabbing the data (and storing it in a text file called
mlbtmp.txt) and some more experimentation:I played around with various permutations of
facet_grid,facet_wrap,scales,coord_flip, etc.. Some worked as expected, some didn’t:I ended up with
facet_wrap(...,scales="free")and usedylimto constrain the axes.In principle there might be a way to use
..density..,..ncount..,..ndensity.., or one of the other statistics computed bystat_bininstead of the default..count.., but I couldn’t find a combination that worked.Instead (as is often the best solution when stuck with ggplot’s on-the-fly transformations) I reshaped the data myself:
There’s obviously a little more prettying-up that could be done here, but this should get you most of the way there …