I am creating stacked bar charts on subsets of data using a loop. I’m exporting one stacked bar chart for each variable for each school. Its all working successfully – each chart shows only the subset of data – except that the x axis continues to be labeled for every school. So I have a big long chart area with blanks, and a stacked bar chart over the school for which the data is subsetted.
Here is sample of my data:
label variable class percent SchoolA Optimism High 67 SchoolA Optimism Med 33 SchoolA Optimism Low 20 SchoolA SelfEsteem High 84 SchoolA SelfEsteem Med 12 SchoolA SelfEsteem Low 4 SchoolB Optimism High 60 SchoolB Optimism Med 21 SchoolB Optimism Low 19 SchoolB SelfEsteem High 20 SchoolB SelfEsteem Med 42 SchoolB SelfEsteem Low 38
…which carries on for several hundred more variables and schools.
In general I’m doing this:
Create an array of values for each school and variable
schools<-unique(df1$label)
variables<-unique(df1$variable)
Function that plots each subset of data as a stacked bar chart
doPlot<-function(subdf){
ggplot(subdf,aes(x=label,y=percent,fill=factor(class)))+
geom_bar(stat="identity")}
Runs doPlot using each subset of data
for(i in 1:length(schools)){
for (j in 1:length(variables)){
subdf<-data.frame(subset(df1,label==schools[i]&variable==variables[j]))
doPlot(subdf)}}
Why is the chart creating and labeling an x-axis for the original data set and not only for the subset of data? I am a R newbie, but I’ve been searching and trying things out for a while and I’m stumped.
It’s because when you subset a dataframe with a factor column (
df1$labelis a factor with levels ‘SchoolA’ and ‘SchoolB’), thelabelcolumn still has multiple levels, even though there is currently only one unique label in it.e.g.:
See how even though
df2$labelis ‘SchoolA’ only, there are still two levels?ggplotuses the levels to draw the graphs.?subsetmentions :Then going to
?droplevels: “used to drop unused levels from .. factors in a data frame”.So try doing:
Then you’ll get just one variable on the x axis, the current school.
(As an aside – is a stacked bar chart the best way to represent these? In particular, I notice that SchoolA’s ‘Optimism’ adds up to 120% overall, and the stacked bar chart seems to enforce this understanding of the data. If that’s not how you want the data to be interpreted, perhaps you could consider a different way of presenting it?)