Another question about computer vision.
A camera matrix (also known as projection matrix) maps a 3D point X (e.g. in the real world) to an image point x (in a photograph, for example) via the following relation:
l **x** = P **X**
P describes some external and internal characteristics of the camera (its orientation, position and projection properties). When we refer to the projection properties, we use a calibration matrix K. Likewise, R represents the rotation of the camera and t its translation, so we can write P as:
P = K [ R | t ]
[ R | t ] means the concatenation of the matrix R and t.
R is a matrix 3 X 3
t is a vector 3 X 1
K is a matrix 3 X 3
[R | t ] is a matrix 3 X 4
As a consequence, P is a matrix 3 X 4
Well, enough introductions. I want to find the translation of the camera matrix P. According to the code in the book Computer Vision with Python, it can be found like this:
def rotation_matrix(a):
""" Creates a 3D rotation matrix for rotation
around the axis of the vector a. """
a = array(a).astype('float')
R = eye(4)
R[:3,:3] = linalg.expm([[0,-a[2],a[1]],[a[2],0,-a[0]],[-a[1],a[0],0]])
return R
tmp = rotation_matrix([0,0,1])[:3,:3]
Rt = hstack((tmp,array([[50],[40],[30]])))
P = dot(K, Rt)
K, R = linalg.rq(P[:,:3])
# This part gets rid of some ambiguity in the solutions of K and R
T = diag(sign(diag(K)))
if linalg.det(T) < 0:
T[1,1] *= -1
K = dot(K, T)
R = dot(T, R) # T is its own inverse
t = dot(linalg.inv(K), P[:,3])
The code is self-contained. There we have Rt that is the matrix [R | t]. P is calculated as usual and an RQ factorization is performed. However, I don’t understand that part. Why are we taking only the first 3 columns? Then we obtain the translation vector as the dot product of K^{-1} and the first 3 columns of P. Why? I haven’t found a justification but maybe it’s something obvious I’m missing.
By the way, the code seems to be a bit off. When I run it, I get a translation vector [ 50. -40. 30.] instead of array([[50],[40],[30]]) that we used as input. We should get exactly the same. I don’t know if this is due to the rotation matrix. I would also appreciate any help on that.
Thanks!
You calculate the translation vector as the multiplication of
inv(K)and the 4th column ofP. Notice that in your code it saysWhere
P[:,3]is the fourth column of the projection matrix since the indexing starts at 0. This means that getting backtis expected sinceP = [KR | Kt]soKtis the 4th column.inv(K) * Kt = tYou can RQ decompose
P[:,:3]into the calibration matrixKand the rotation matrixRagain becauseP = [KR | Kt]and because calibration matrices are upper triangular and rotation matrices are orthogonal.As to why you are getting a different translation vector than what you expected, I think it might be due to the fact that QR decompositions are not unique in general. As per wikipedia, they are only unique when we require that all the diagonal elements of
Rare positive. HereRis the upper triangular matrix, in your caseK.If your matrix
Khas a negative element anywhere on the diagonal you may get back a differentK(perhaps only different in one sign) from the QR decomposition. This would mean that you don’t get back thetyou expected.