I have a quite large data frame, about 10 millions of rows. It has

Question

0

Asked: June 16, 20262026-06-16T09:43:28+00:00 2026-06-16T09:43:28+00:00

I have a quite large data frame, about 10 millions of rows. It has

0

I have a quite large data frame, about 10 millions of rows. It has columns x and y, and what I want is to compute

hypot <- function(x) {sqrt(x[1]^2 + x[2]^2)}

for each row. Using apply it would take a lot of time (about 5 minutes, interpolating from lower sizes) and memory.

But it seems to be too much for me, so I’ve tried different things:

compiling the hypot function reduces the time by about 10%
using functions from plyr greatly increases the running time.

What’s the fastest way to do this thing?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T09:43:29+00:00

What about with(my_data,sqrt(x^2+y^2)) ?

set.seed(101)
d <- data.frame(x=runif(1e5),y=runif(1e5))

library(rbenchmark)

Two different per-line functions, one taking advantage of vectorization:

hypot <- function(x) sqrt(x[1]^2+x[2]^2)
hypot2 <- function(x) sqrt(sum(x^2))

Try compiling these too:

library(compiler)
chypot <- cmpfun(hypot)
chypot2 <- cmpfun(hypot2)

benchmark(sqrt(d[,1]^2+d[,2]^2),
          with(d,sqrt(x^2+y^2)),
          apply(d,1,hypot),
          apply(d,1,hypot2),
          apply(d,1,chypot),
          apply(d,1,chypot2),
          replications=50)

Results:

                       test replications elapsed relative user.self sys.self
5       apply(d, 1, chypot)           50  61.147  244.588    60.480    0.172
6      apply(d, 1, chypot2)           50  33.971  135.884    33.658    0.172
3        apply(d, 1, hypot)           50  63.920  255.680    63.308    0.364
4       apply(d, 1, hypot2)           50  36.657  146.628    36.218    0.260
1 sqrt(d[, 1]^2 + d[, 2]^2)           50   0.265    1.060     0.124    0.144
2  with(d, sqrt(x^2 + y^2))           50   0.250    1.000     0.100    0.144

As expected the with() solution and the column-indexing solution à la Tyler Rinker are essentially identical; hypot2 is twice as fast as the original hypot (but still about 150 times slower than the vectorized solutions). As already pointed out by the OP, compilation doesn’t help very much.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a quite large data frame, about 10 millions of rows. It has

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply