I am working with a data set (column 1=gene names and column 2 =

Question

0

Asked: June 8, 20262026-06-08T19:58:07+00:00 2026-06-08T19:58:07+00:00

I am working with a data set (column 1=gene names and column 2 =

0

I am working with a data set (column 1=gene names and column 2 = expression values) and I’m trying to do a cluster plot but what I find is that the branches are labeled by row number rather than the gene ID from column 1.

dataset: https://dl.dropbox.com/u/364456/miRNA.csv

Using:

attach(animals)
d=dist(as.matrix(animals))
hc=hclust(d)
plot(hc)

resulting plot:

enter image description here

I’ve tried to do kmeans clustering and end up getting this error:

NAs introduced by coercion.

Which indicates to me that I have not formatted my data file correctly.

Anyone know what’s going on here?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T19:58:09+00:00

For hclust to recognize your gene name as the correct label name, this column has to be the row names.

Problem: gene mmu-miR-191 appears twice and row names cannot be repeated. Considering the value for both rows are the same, I’m just gonna assume it is a duplicate and erase the second one.

read.table("miRNA.csv", sep=",", header=TRUE, row.names=1) -> mirna
mirna[-34,] -> mirna  # Delete the redundant row.
row.names(mirna) <- mirna[,1] # Declare column 1 as the row names
dist(as.matrix(mirna)) -> d # And then your routine
hc <- hclust(d)
plot(hc)

enter image description here

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am working with a data set (column 1=gene names and column 2 =

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply