I am working with a data set (column 1=gene names and column 2 = expression values) and I’m trying to do a cluster plot but what I find is that the branches are labeled by row number rather than the gene ID from column 1.
dataset: https://dl.dropbox.com/u/364456/miRNA.csv
Using:
attach(animals)
d=dist(as.matrix(animals))
hc=hclust(d)
plot(hc)
resulting plot:

I’ve tried to do kmeans clustering and end up getting this error:
NAs introduced by coercion.
Which indicates to me that I have not formatted my data file correctly.
Anyone know what’s going on here?
For
hclustto recognize your gene name as the correct label name, this column has to be the row names.Problem: gene
mmu-miR-191appears twice and row names cannot be repeated. Considering the value for both rows are the same, I’m just gonna assume it is a duplicate and erase the second one.