I am searching several large files for replicate gene entries. There are several duplicates and at least one triplcate entry in my list of genes. I just want to be able to find out what lines are reps’!?!
I get the error:
Error in if (genes[i, 1] == genes[j, 1] && i != j) { :
missing value where TRUE/FALSE needed
I am at a roadblock.
genes <- combine[c(4)]
num_rows <- nrow(genes)
dup_combine <- vector(mode="character", length=100)
n=1
for (i in 1:num_rows) {
only_check_rows <- num_rows-1
for (j in i+1:only_check_rows) {
if (genes[i,1] == genes[j,1]&&i!=j) {
dup_combine[n] <- combine[i,1]
n=n+1
cat("i=",i,"j=",j,"\n")
}
}
}
It looks like you are searching for duplicates in a single vector (
genes). There are several ways to do this. Here’s some example data:tablewill count the number of occurences of each unique value indat. Note I useexclude = NULLto force it to countNAvalues as well:As mentioned in a comment,
duplicatedalso applies. This function returns a boolean vector indicating specifically which entries are duplicates of previous entries.fromLast = TRUEtells it to look from back to front, rather than from front to back.You can combine these two directions to get all the duplicated elements:
If you are working with data frames, rather than single vectors,
duplicatedis probably the way to go.Edit
Here’s a short worked example using a sample data frame: