I am searching several large files for replicate gene entries. There are several duplicates

Question

0

Asked: May 25, 20262026-05-25T13:20:38+00:00 2026-05-25T13:20:38+00:00

I am searching several large files for replicate gene entries. There are several duplicates

0

I am searching several large files for replicate gene entries. There are several duplicates and at least one triplcate entry in my list of genes. I just want to be able to find out what lines are reps’!?!

I get the error:

Error in if (genes[i, 1] == genes[j, 1] && i != j) { : 
missing value where TRUE/FALSE needed

I am at a roadblock.

genes <- combine[c(4)]
num_rows <- nrow(genes)
dup_combine <- vector(mode="character", length=100)
n=1
for (i in 1:num_rows) {
only_check_rows <- num_rows-1
   for (j in i+1:only_check_rows) {
      if (genes[i,1] == genes[j,1]&&i!=j) {
         dup_combine[n] <- combine[i,1]
         n=n+1
         cat("i=",i,"j=",j,"\n")
      }
   }
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T13:20:38+00:00

It looks like you are searching for duplicates in a single vector (genes). There are several ways to do this. Here’s some example data:

dat <- c(1,2,3,2,4,4,6,NA,8,NA,13)

table will count the number of occurences of each unique value in dat. Note I use exclude = NULL to force it to count NA values as well:

table(dat, exclude = NULL)

As mentioned in a comment, duplicated also applies. This function returns a boolean vector indicating specifically which entries are duplicates of previous entries. fromLast = TRUE tells it to look from back to front, rather than from front to back.

duplicated(dat)
duplicated(dat, fromLast = TRUE)

You can combine these two directions to get all the duplicated elements:

subset(dat, duplicated(dat) | duplicated(dat, fromLast = TRUE))

If you are working with data frames, rather than single vectors, duplicated is probably the way to go.

Edit

Here’s a short worked example using a sample data frame:

dat <- data.frame(x = c(1,2,3,4,4,5,6,5,9),
        y = c(2,3,1,2,2,6,2,6,10))
> dat
  x  y
1 1  2
2 2  3
3 3  1
4 4  2
5 4  2
6 5  6
7 6  2
8 5  6
9 9 10

#Boolean vector of duplicated rows
duplicated(dat)
[1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE

#Indices of duplicated rows   
which(duplicated(dat))
[1] 5 8

#Look in both directions to get all dups (indices)
which(duplicated(dat) | duplicated(dat,fromLast = TRUE))
[1] 4 5 6 8

#The actual rows
subset(dat,duplicated(dat) | duplicated(dat, fromLast = TRUE))
  x y
4 4 2
5 4 2
6 5 6
8 5 6

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am searching several large files for replicate gene entries. There are several duplicates

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply