Another question about computer vision. A camera matrix (also known as projection matrix) maps

Question

0

Asked: June 13, 20262026-06-13T02:45:20+00:00 2026-06-13T02:45:20+00:00

Another question about computer vision. A camera matrix (also known as projection matrix) maps

0

Another question about computer vision.

A camera matrix (also known as projection matrix) maps a 3D point X (e.g. in the real world) to an image point x (in a photograph, for example) via the following relation:

l **x** = P **X**

P describes some external and internal characteristics of the camera (its orientation, position and projection properties). When we refer to the projection properties, we use a calibration matrix K. Likewise, R represents the rotation of the camera and t its translation, so we can write P as:

P = K [ R | t ]

[ R | t ] means the concatenation of the matrix R and t.

R  is a matrix 3 X 3
t is a vector 3 X 1 
K is a matrix 3 X 3
[R | t ] is a matrix 3 X 4
As a consequence, P is a matrix 3 X 4

Well, enough introductions. I want to find the translation of the camera matrix P. According to the code in the book Computer Vision with Python, it can be found like this:

def rotation_matrix(a):
    """ Creates a 3D rotation matrix for rotation
    around the axis of the vector a. """
    a = array(a).astype('float')
    R = eye(4)
    R[:3,:3] = linalg.expm([[0,-a[2],a[1]],[a[2],0,-a[0]],[-a[1],a[0],0]])
    return R

tmp = rotation_matrix([0,0,1])[:3,:3]
Rt = hstack((tmp,array([[50],[40],[30]])))
P = dot(K, Rt)
K, R = linalg.rq(P[:,:3])

# This part gets rid of some ambiguity in the solutions of K and R
T = diag(sign(diag(K)))
if linalg.det(T) < 0:
    T[1,1] *= -1
    K = dot(K, T)
    R = dot(T, R) # T is its own inverse

t = dot(linalg.inv(K), P[:,3])

The code is self-contained. There we have Rt that is the matrix [R | t]. P is calculated as usual and an RQ factorization is performed. However, I don’t understand that part. Why are we taking only the first 3 columns? Then we obtain the translation vector as the dot product of K^{-1} and the first 3 columns of P. Why? I haven’t found a justification but maybe it’s something obvious I’m missing.

By the way, the code seems to be a bit off. When I run it, I get a translation vector [ 50. -40. 30.] instead of array([[50],[40],[30]]) that we used as input. We should get exactly the same. I don’t know if this is due to the rotation matrix. I would also appreciate any help on that.

Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T02:45:22+00:00

You calculate the translation vector as the multiplication of inv(K) and the 4th column of P. Notice that in your code it says

t = dot(linalg.inv(K), P[:,3])

Where P[:,3] is the fourth column of the projection matrix since the indexing starts at 0. This means that getting back t is expected since P = [KR | Kt] so Kt is the 4th column. inv(K) * Kt = t

You can RQ decompose P[:,:3] into the calibration matrix K and the rotation matrix R again because P = [KR | Kt] and because calibration matrices are upper triangular and rotation matrices are orthogonal.

As to why you are getting a different translation vector than what you expected, I think it might be due to the fact that QR decompositions are not unique in general. As per wikipedia, they are only unique when we require that all the diagonal elements of R are positive. Here R is the upper triangular matrix, in your case K.

If your matrix K has a negative element anywhere on the diagonal you may get back a different K (perhaps only different in one sign) from the QR decomposition. This would mean that you don’t get back the t you expected.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Another question about computer vision. A camera matrix (also known as projection matrix) maps

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply