I am trying to create a pairs plot of 6 data variables using ggplot2 and colour the points according to the k-means cluster they belong to. I read the documentation of the highly impressive ‘GGally’ package as well as an informal fix by Adam Laiacano [http://adamlaiacano.tumblr.com/post/13501402316/colored-plotmatrix-in-ggplot2]. Unfortunately, I could not find any way to get the desired output in either.
Here is a sample code:-
#The Swiss fertility dataset has been used here
data_ <- read.csv("/home/tejaskale/Ubuntu\ One/IUCAA/Datasets/swiss.csv", header=TRUE)
data_ <- na.omit(data_)
u <- c(2, 3, 4, 5, 6, 7)
x <- data_[,u]
k <- 3
maxIterations <- 100
noOfStarts <- 100
filename <- 'swiss.csv'
library(ggplot2)
library(gridExtra)
library(GGally)
kmeansOutput <- kmeans(x, k, maxIterations, noOfStarts)
xNew <- cbind(x[,1:6], as.factor(kmeansOutput$cluster))
names(xNew)[7] <- 'cluster'
kmeansPlot <- ggpairs(xNew[,1:6], color=xNew$cluster)
OR
kmeansPlot <- plotmatrix(xNew[,1:6], mapping=aes(colour=xNew$cluster))
Both plots are created but aren’t coloured according to clusters.
Hope I haven’t missed an answer to this question on the forum and apologize if that is indeed the case. Any help would be highly appreciated.
Thanks!
The following slight modification of
plotmatrix2works fine for me:It may be a ggplot2 version issue, but I had to force the faceting variables in the
densitiesdata frame to be factors (that seems broken to me even in the GGally version). Also, generally don’t pass vectors toaes(), but simply column names.