Anyone have a good reference for how to do a multivariate ordinary linear regression without saving the input data (and get the R-squared of the result). The use case is a data set with too many rows to store. The regression can be obtained by accumulating x[i]*x[j] and y * x[i], and then doing the matrix math from there, but I can’t find a similar formula to get the statistics when I’m done (R-squared for starters). Thanks.
Anyone have a good reference for how to do a multivariate ordinary linear regression
Share
I don’t have a good reference, but the way I’d approach it is to
expand out the sum-of-squared expressions, and write them
in terms of the expectations that you are accumulating.
I use
<.>to indicate averaging over rows of data,so that
<y>is the average of the y-values,and so on
at any point we can obtain the regression coefficients a[i] and b
from the matrix
<x[i]*x[j]>and the vector<y*x[i]>as you indicated in your questionsum_i{ a[i]*x[i] }to indicate a sum over the componentsthat comprise the independent variables.
A way to compute the explained mean-squared deviation is:
You already maintain
<x[i]*x[i]>as the diagonal elements of the matrix forderiving the regression coefficients.
You will also need to maintain the averages of the independent variables
(
<x[i]>for eachi) as well as for the dependent variable (<y>)Similar expansions can carried out for either the total or residual mean squared
errors, and then used to compute the R^2 value.