I’m trying to speed up the function below (for later bootstrapping) which performs least

Question

0

Asked: June 3, 20262026-06-03T01:57:33+00:00 2026-06-03T01:57:33+00:00

I’m trying to speed up the function below (for later bootstrapping) which performs least

0

I’m trying to speed up the function below (for later bootstrapping) which performs least squares fitting of a straight line with errors in both x and y. I think the main hang up is in the while loop. The input values for the function are the observations x and y and the absolute uncertainties in those values sx and sy.

york <- function(x, y, sx, sy){

    x <- cbind(x)
    y <- cbind(y)

    # initial least squares regression estimation
    fit <- lm(y ~ x)
    a1 <- as.numeric(fit$coefficients[1])   # intercept
    b1 <- as.numeric(fit$coefficients[2])   # slope
    e1 <- cbind(as.numeric(fit$residuals))  # residuals
    theta.fit <- rbind(a1, b1)

    # constants
    rho.xy <- 0     # correlation between x and y

    # initialize york regression
    X <- cbind(1, x)
    a <- a1
    b <- b1
    tol <- 1e-15    # tolerance
    d <- tol
    i = 0

    # york regression
    while (d > tol || d == tol){
        i <- i + 1
        a2 <- a
        b2 <- b
        theta2 <- rbind(a2, b2)
        e <- y - X %*% theta2
        w <- 1 / sqrt((sy^2) + (b2^2 * sx^2) - (2 * b2 * sx * sy * rho.xy))
        W <- diag(w)
        theta <- solve(t(X) %*% (W %*% W) %*% X) %*% t(X) %*% (W %*% W) %*% y

        a <- theta[1]
        b <- theta[2]

        mswd <- (t(e) %*% (W%*%W) %*% e)/(length(x) - 2)
        sfit <- sqrt(mswd)
        Vo <- solve(t(X) %*% (W %*% W) %*% X)
        dif <- b - b2
        d <- abs(dif)
        }

    # format results to data.frame
    th <- data.frame(a, b)
    names(th) <- c("intercept", "slope")
    ft <- data.frame(mswd, sfit)
    names(ft) <- c("mswd", "sfit")
    df <- data.frame(x, y, sx, sy, as.vector(e), diag(W))
    names(df) <- c("x", "y", "sx", "sy", "e", "W")

    # store output results
    list(coefficients = th,
        vcov = Vo,
        fit = ft,
        df = df)
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T01:57:34+00:00

Your function can be sped up with a few simple changes. Primarily, you should move anything out of the while loop that doesn’t need to be there. For example, you run solve twice on the same data. Also, you calculate the sfit on every iteration, when you only use it on the last iteration of the while loop.

Here is my code:

york.fast <- function(x, y, sx, sy, tol=1e-15){
    # initial least squares regression estimation
    fit <- lm(y ~ x)
    theta <- fit$coefficients
    # initialize york regression
    X <- cbind(1, x)
    d <- tol
    # york regression
    while (d >= tol){
        b2 <- theta[2]
        # w <- 1 / sqrt((sy^2) + (b2^2 * sx^2) - (2 * b2 * sx * sy * rho.xy)) # rho.xy is always zero!
        w <- 1 / sqrt(sy^2 + (b2^2 * sx^2))  # rho.xy is always zero!
        # W <- diag(w)
        # w2 <- W %*% W
        w2 <- diag(w^2) # As suggested in the comments.
        base <- crossprod(X,w2)
        Vo <- solve(base %*% X)
        theta <- Vo %*% base %*% y
        d <- abs(theta[2] - b2)
     }
     e <- y - X %*% theta
     mswd <- (crossprod(e,w2) %*% e) / (length(x) - 2)
     sfit <- sqrt(mswd)

    # format results to data.frame
    th <- data.frame(intercept=theta[1], slope=theta[2])
    ft <- data.frame(mswd=mswd, sfit=sfit)
    df <- data.frame(x=x, y=y, sx=sx, sy=sy, e=as.vector(e), W=diag(diag(w)))

    # store output results
    list(coefficients = th, vcov = Vo, fit = ft, df = df)
}

A little test:

n=225
set.seed(1)
x=rnorm(n)
y=rnorm(n)
sx=rnorm(n)
sy=rnorm(n)

system.time(test<-york.fast(x,y,sx,sy)) # 0.37 s
system.time(gold<-york(x,y,sx,sy)) # 1.28 s

I noticed that rho.xy is always fixed at zero. Is this perhaps a mistake?

I noticed as well that you often use cbind to convert a vector into a matrix with one column. All vectors are automatically considered matrices with one column, so you can avoid a lot of extra code.

As @joran mentioned, the tolerance level is set so small that it will take a long time to converge; consider using a larger tolerance.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to speed up the function below (for later bootstrapping) which performs least

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply