I’m trying to speed up the function below (for later bootstrapping) which performs least squares fitting of a straight line with errors in both x and y. I think the main hang up is in the while loop. The input values for the function are the observations x and y and the absolute uncertainties in those values sx and sy.
york <- function(x, y, sx, sy){
x <- cbind(x)
y <- cbind(y)
# initial least squares regression estimation
fit <- lm(y ~ x)
a1 <- as.numeric(fit$coefficients[1]) # intercept
b1 <- as.numeric(fit$coefficients[2]) # slope
e1 <- cbind(as.numeric(fit$residuals)) # residuals
theta.fit <- rbind(a1, b1)
# constants
rho.xy <- 0 # correlation between x and y
# initialize york regression
X <- cbind(1, x)
a <- a1
b <- b1
tol <- 1e-15 # tolerance
d <- tol
i = 0
# york regression
while (d > tol || d == tol){
i <- i + 1
a2 <- a
b2 <- b
theta2 <- rbind(a2, b2)
e <- y - X %*% theta2
w <- 1 / sqrt((sy^2) + (b2^2 * sx^2) - (2 * b2 * sx * sy * rho.xy))
W <- diag(w)
theta <- solve(t(X) %*% (W %*% W) %*% X) %*% t(X) %*% (W %*% W) %*% y
a <- theta[1]
b <- theta[2]
mswd <- (t(e) %*% (W%*%W) %*% e)/(length(x) - 2)
sfit <- sqrt(mswd)
Vo <- solve(t(X) %*% (W %*% W) %*% X)
dif <- b - b2
d <- abs(dif)
}
# format results to data.frame
th <- data.frame(a, b)
names(th) <- c("intercept", "slope")
ft <- data.frame(mswd, sfit)
names(ft) <- c("mswd", "sfit")
df <- data.frame(x, y, sx, sy, as.vector(e), diag(W))
names(df) <- c("x", "y", "sx", "sy", "e", "W")
# store output results
list(coefficients = th,
vcov = Vo,
fit = ft,
df = df)
}
Your function can be sped up with a few simple changes. Primarily, you should move anything out of the while loop that doesn’t need to be there. For example, you run
solvetwice on the same data. Also, you calculate thesfiton every iteration, when you only use it on the last iteration of the while loop.Here is my code:
A little test:
I noticed that
rho.xyis always fixed at zero. Is this perhaps a mistake?I noticed as well that you often use
cbindto convert avectorinto amatrixwith one column. All vectors are automatically considered matrices with one column, so you can avoid a lot of extra code.As @joran mentioned, the tolerance level is set so small that it will take a long time to converge; consider using a larger tolerance.