Given the input-output function in the matrix form:
|y1| = |p1 p2| |x1| |p5|
|y2| = |p3 p4| * |x2| + |p6|
with p1−p6 parameters.I want to minimize the least square error using Gauss-Newton method. Suppose we have 100 measurements. My question is about calculation and size of residual vector.
|r1| |y1| ( |p1 p2| |x1| |p5| )
|r2| = |y2| -( |p3 p4| ∗ |x2| + |p6| )
ri= output − f(input,parameters)
err = Sum(ri2)
In order to calculate parameters with minimized-error, we have:
pi+1=pi + Δ
Δ= (JT*J)-1 * JT * ri
I think the size of each is as follows:
input vector(x) : 100×2
output vector(y): 100×2
resuidual (r) : 100×2
Jacobian (J) : 100×6
parameters(pi) : 6×1 (six parameters)
As you can see, the size of Δ would be 6×2 that seems not consistent with p_i+1
Now is my residual vector calculation process correct? if yes how can I compute parameter’s vector? And if not what is the correct answer?
Another thing is the way of calculation J (Jacobian) with respect to parameters matrix.
Thank you so much.
The Delta vector is the same size as the parameter vector.
In fact in this case one can solve for the parameters without doing a full least squares. Unfortunately I’ve not been able to find a reference on the web, so I’ve attempted to explain the method:
We want to find a matrix M and a vector t so that, given N points Y[] and and N points X[], the error S is as small as possible, where
(‘ is transpose).
The drill is:
a/ compute the average Y^ of the Y[] and subtract it from each Y[i], getting y[i]
b/ compute the average X^ of the X[] and subtract it from each X[i], getting x[i]
c/ compute the matrices
and
d/ if C is not invertible (which means that all the X[] lie on a line) you’re a bit
stuck, or at least this method fails; otherwise compute
e/ finally, compute
Why does this work?
First off, if we think of M as fixed, then it’s easy to see that the t that minimises S is just the average of the Y[i]-M*X[i]; so we may as well use this t when finding M. Substituting t into the formula for S we get
Now let Tr() denote the function that maps a matrix to its trace (the sum of the diagonal elements). A trick that is quite often useful is that for a vector v we have
Applying this to S, and expanding the product, we get
If C is invertible we can complete the square and get (here D = inverse(C))
The first two terms don’t depend on M; the third is never negative and can be
made zero by choosing M=A*D.