I have a three dataframes, and I want to add some columns to the

Question

0

Asked: June 4, 20262026-06-04T08:48:51+00:00 2026-06-04T08:48:51+00:00

I have a three dataframes, and I want to add some columns to the

0

I have a three dataframes, and I want to add some columns to the first dataframe which counts the number of times the first two columns in the first dataframe appear in the other dataframes e.g.

dataframe – x
a b
1 1
1 2
2 1
2 2

dataframe – y
a b
1 1
1 1
1 2
2 2
2 2

dataframe – z
a b
1 2
2 1
2 1
2 2

So the first dataframe would become
a b y z
1 1 2 0
1 2 1 1
2 1 0 2
2 2 2 1

I have ways to do this, e.g. I am currently doing

x$y<- sapply(1:nrow(x), function(i){
    sum(y$a == x$a[i] & y$b == x$b[i])
  }

x$z<- sapply(1:nrow(x), function(i){
    sum(z$a == x$a[i] & z$b == x$b[i])
  }

But my dataframe is very large and my way takes a while to complete so I was wondering of the quickest way to do this.

Please ask if anything is unclear.

Thanks in advance

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T08:48:52+00:00

To avoid the double loop, I would use the function match, which is optimized for finding elements in another list. To count how many elements, I propose to tabulate the variables first, and then to match against the table.

My guess is that it would significantly reduce the time complexity, because the method you propose is quadratic (one loop goes over x rows and for each an inner loop goes over y rows) whereas the functions match and table are based on sorts (I think) which are rather n*log(n).

We first turn the data frames into vectors with paste, taken from the answer of Josh:

# Recreate your data
x <- data.frame(a=c(1,1,2,2), b=c(1,2,1,2))
y <- data.frame(a=c(1,1,1,2,2), b=c(1,1,2,2,2))
z <- data.frame(a=c(1,2,2,2), b=c(2,1,1,2))

# Use paste to combine the two columns
X <- do.call(paste, c(x, sep="_"))
Y <- do.call(paste, c(y, sep="_"))
Z <- do.call(paste, c(z, sep="_"))

Then we tabulate and match against the tabluation.

x$y <- table(Y)[match(X, names(table(Y)))]
x$y[is.na(x$y)] <- 0

x$z <- table(Z)[match(X, names(table(Z)))]
x$z[is.na(x$z)] <- 0

x  
a b y z
1 1 1 2 0
2 1 2 1 1
3 2 1 0 2
4 2 2 2 1

You could put table(Y) in an intermediate variable if you want to avoid tabulating two times.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a three dataframes, and I want to add some columns to the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply