I can’t figure out what is the difference between these two functions in R.

Question

0

Asked: June 14, 20262026-06-14T06:00:09+00:00 2026-06-14T06:00:09+00:00

I can’t figure out what is the difference between these two functions in R.

0

I can’t figure out what is the difference between these two functions in R.
I have a data.frame, and I want to remove rows corresponding to duplicated values in a given column;

    Acc         Probe             Coord_homol
1   NR_004442.1 225541_at~122     391
2   NM_028059.2 241348_at~444     4642
3   NM_028059.2 241348_at~468     4666
4   NM_001114   212306_at~4357    5034
5   NM_010573.2 230472_at~402     1987
6   NM_029633.2 212306_at~4357    4289
7   NM_00108196 212306_at~4357    4292
8   NM_029891.2 205004_at~3421    2963
9   NM_029891.2 205004_at~3635    3173
10  NM_007892.2 221586_s_at~1356 1257
11  NR_036613.1 208672_s_at~829  1301
12  NR_036613.1 208673_s_at~1472 1854
13  NM_011078.3 212726_at~3872    5175
14  NM_011078.3 212726_at~3887    5190
15  NM_013915.3 207164_s_at~1523 2911

in this case, I would like to remove rows 7 because the probe is the same as in row 6 (rows with same probes do not have to be successive ones).

I first tried unique(), and later found duplicated.
but if the following command

dat[!duplicated(dat$probe),]

dat[unique(dat$probe),]

give the same number of lines in the resulting data.frame, the results are not the same.

I tried on a much simpler case, like the following:

a simple data.frame:

> dat
   probe val
1    aaa  10
2    bbb  12
3    ccc  45
4    ddd  32
5    aaa  42
6    eee  10
7    fff  13
8    ccc  85
9    aaa  75
10   ddd  64

using !duplicated(): it seems to be what I want to do;

dat[!duplicated(dat$probe),]

  probe val
1   aaa  10
2   bbb  12
3   ccc  45
4   ddd  32
6   eee  10
7   fff  13

using unique():

dat[unique(dat$probe),]

I get:

 probe val
1   aaa  10
2   bbb  12
3   ccc  45
4   ddd  32
5   aaa  42
6   eee  10

Not what I want;

But what exactly unique() is doing ?

Thanks for your help.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T06:00:13+00:00

unique is returning a factor and the numeric levels of the factor are being used for indexing rather than the labels.

uni <- unique(dat$probe)
str(uni)
 Factor w/ 6 levels "aaa","bbb","ccc",..: 1 2 3 4 5 6

It is like you are doing this:

nums <- as.numeric(unique(dat$probe))
dat[nums,]
  probe val
1   aaa  10
2   bbb  12
3   ccc  45
4   ddd  32
5   aaa  42
6   eee  10

unique is returning a factor because we are putting a factor into it in this case. It doesn’t always return factors. For example, unique(as.character(dat$probe)) would return characters.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I can’t figure out what is the difference between these two functions in R.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply