I have an operation I’d like to run for each row of a data

Question

0

Asked: May 27, 20262026-05-27T06:38:07+00:00 2026-05-27T06:38:07+00:00

I have an operation I’d like to run for each row of a data

0

I have an operation I’d like to run for each row of a data frame, changing one column. I’m an apply/ddply/sqldf man, but I’ll use loops when they make sense, and I think this is one of those times. This case is tricky because the column to changes depends on information that changes by row; depending on information in one cell, I should make a change to only one of ten other cells in that row. With 75 columns and 20000 rows, the operation takes 10 minutes, when every other operation in my script takes 0-5 seconds, ten seconds max. I’ve stripped my problem down to the very simple test case below.

n <- 20000
t.df <- data.frame(matrix(1:5000, ncol=10, nrow=n) )
system.time(
 for (i in 1:nrow(t.df)) {
 t.df[i,(t.df[i,1]%%10 + 1)] <- 99
 }
)

This takes 70 seconds with ten columns, and 360 when ncol=50. That’s crazy. Are loops the wrong approach? Is there a better, more efficient way to do this?

I already tried initializing the nested term (t.df[i,1]%%10 + 1) as a list outside the for loop. It saves about 30 seconds (out of 10 minutes) but makes the example code above more complicated. So it helps, but its not the solution.

My current best idea came while preparing this test case. For me, only 10 of the columns are relevant (and 75-11 columns are irrelevant). Since the run times depend so much on the number of columns, I can just run the above operation on a data frame that excludes irrelevant columns. That will get me down to just over a minute. But is “for loop with nested indices” even the best way to think about my problem?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T06:38:07+00:00

Editorial Team

2026-05-27T06:38:07+00:00Added an answer on May 27, 2026 at 6:38 am

Using row and col seems less complicated to me:

t.df[col(t.df) == (row(t.df) %% 10) + 1]  <- 99

I think Tommy’s is still faster, but using row and col might be easier to understand.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have an operation I’d like to run for each row of a data

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply