Input file is
Mydata <- read.table(con <- textConnection('
gene treatment1 treatment2 treatment3
aaa 1 0 1
bbb 1 1 1
ccc 0 0 0
eee 0 1 0
'), header=TRUE)
close(con)
Mydata is
gene treatment1 treatment2 treatment3
1 aaa 1 0 1
2 bbb 1 1 1
3 ccc 0 0 0
4 eee 0 1 0
In order to built cluster, I have done
d <- dist(mydata, method = "euclidean")
fit <- hclust(d, method="ward")
plot(fit)
I got the cluster based on “euclidean” distance.
In my previous message in stackoverflow
How to use R to compute Tanimoto/Jacquard Score as distance matrix
I found I can also calculate tanimoto-jacquard distance matrix with R. Could you mind to teach me how to incorporate tanimoto-jacquard with the previous steps to get a cluster based on distance matrix calculated by tanimoto-jacquard distance instead of euclidean? Thanks a lot.
What is it you don’t understand?
?vegdisttells us that it returns an object of class"dist"so you can just remove thedist(....)line and replace it with one callingvegdist(....). For example:You need to drop the first column (and should have done in the Euclidean version you showed in your Q) as this is not data that should be used to form the dissimilarity matrix.
That will generate a warning:
because row 3 contains no information to form the jaccard distance between it and the other samples. You might want to consider if the jaccard is most appropriate in such cases.
The OP now wants the gene labels as row names. The easiest option is to tell R this when reading the data in, using the
row.namesargument toread.table():giving:
Or if the data are already in R and it is a pain to reload and redo previous computations, just assign the
genecolumn to the row names and remove thegenecolumn (using the originalmydata):giving: