I want to subset a dataframe which has an ID column ( v1 ,

Question

0

Asked: May 25, 20262026-05-25T20:20:28+00:00 2026-05-25T20:20:28+00:00

I want to subset a dataframe which has an ID column ( v1 ,

0

I want to subset a dataframe which has an ID column (v1, all unique) and a “linked” ID column (v2). The expectation of v2 is that it may contain NAs, but where it does, the corresponsing element of v1 does not appear elsewhere in v2. Also, it is expected that the relation between the columns is symmetric: where there is an entry, x, in v2 the v1 entry of that row, y, is mirrored in another row where v1 has x and v2 has y. The last criteria is that the relation is not reflexive: ie x!=y.

I want to subset the dataframe to the elements which don’t fit the expected criteria.

Here is some sample data to illustrate:

set.seed(1)
dfr <- data.frame(v1=letters,v2=rev(letters))
dfr[sample(26,10),2]<-NA
dfr[sample(26,5),2]<-sample(letters,5)


dfr
   v1   v2
1   a    z
2   b <NA>
3   c    x
4   d    w
5   e <NA>
6   f    u
7   g <NA>
8   h    s
9   i    i
10  j <NA>
11  k    p
12  l <NA>
13  m    f
14  n <NA>
15  o    l
16  p    k
17  q    j
18  r    e
19  s <NA>
20  t    g
21  u <NA>
22  v    e
23  w <NA>
24  x    q
25  y    x
26  z    a

So rows 1, 2, 11, 14, 16, and 26 all meet the criteria, and I want to identify the rest.

I have attempted some solutions using match, but the NAs are causing problems. It also probably relies on the fact that in this case v2 is based on rev(v1), whereas a more general solution can’t make that assumption.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T20:20:29+00:00

If I correctly understand, here is an example:

> subset(dfr, (is.na(v2) & !(v1%in%dfr$v2)) | !is.na(v2) & paste(v1, v2) %in% paste(dfr$v2, dfr$v1))
   v1   v2
1   a    z
2   b <NA>
9   i    i
11  k    p
14  n <NA>
16  p    k
26  z    a

# or if v1 == v2 is not included:
> subset(dfr, (is.na(v2) & !(v1%in%dfr$v2)) | !is.na(v2) & (v1 != v2 & paste(v1, v2) %in% paste(dfr$v2, dfr$v1)))
   v1   v2
1   a    z
2   b <NA>
11  k    p
14  n <NA>
16  p    k
26  z    a

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to subset a dataframe which has an ID column ( v1 ,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply