Understanding OpenCV cv::estimateRigidTransform

Last Updated on April 26, 2021 by nghiaho12

OpenCV’s estimateRigidTransform is a pretty neat function with many uses. The function definition is

Mat estimateRigidTransform(InputArray src, InputArray dst, bool fullAffine)

The third parameter, fullAffine, is quite interesting. It allows the user to choose between a full affine transform, which has 6 degrees of freedom (rotation, translation, scaling, shearing) or a partial affine (rotation, translation, uniform scaling), which has 4 degrees of freedom. I’ve only ever used the full affine in the past but the second option comes in handy when you don’t need/want the extra degrees of freedom.

Anyone who has ever dug deep into OpenCV’s code to figure out how an algorithm works may notice the following:

  • Code documentation for the algorithms is pretty much non-existent.
  • The algorithm was probably written by some soviet Russian theoretical physicists who thought it was good coding practice to write cryptic code that only a maths major can understand.

The above applies to the cv::estimateRigidTransform to some degree. That function ends up calling static cv::getRTMatrix() in lkpyramid.cpp (what the heck?), where the maths is done.

In this post  I’ll look at the maths behind the function and hopefully shed some light on how it works.

Full 2D affine transform

The general 2D affine transform has 6 degree of freedoms of the form:

\[ A = \begin{bmatrix} a & b & c \\ d & e & f \end{bmatrix} \]

This transform combines rotation, scaling, shearing, translation and reflection in some cases.

Solving for A requires a minimum of 3 pairing points (that aren’t degenerate!). This is straight forward to do. Let’s denote the input point to be X= [x y 1]^t and the output to be Y = [x’ y’ 1]^t, giving:

\[ AX = Y \]
\[ \begin{bmatrix} a & b & c \\ d & e & f \end{bmatrix} \begin{bmatrix} x \\ y \\ 1\end{bmatrix} = \begin{bmatrix} x’ \\ y’ \\ 1\end{bmatrix} \]

Expanding this gives

\[ \begin{matrix} {ax + by + c = x’} \\ {dx + ey + f = y’}\end{matrix} \]

We can re-write this as a typical Ax = b problem and solve for x. We’ll also need to introduce 2 extra pair of points to be able to solve for x.

\[ \begin{bmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 &x_1 & y_1 & 1 \\ x_2 & y_2 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 &x_2 & y_2 & 1 \\ x_3 & y_3 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 &x_3 & y_3 & 1\end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \end{bmatrix} = \begin{bmatrix} x’_1 \\ y’_1 \\ x’_2 \\ y’_2 \\ x’_3 \\ y’_3 \end{bmatrix} \]

Now plug in your favourite linear solver to solve for [a, b, c, d, e, f]. If you have a 3 or more points, a simple least square solution can be obtained by doing a pseudo-inverse:

\[ \begin{matrix} Ax & = & b \\ A{^T}Ax & = & A^{T}b \\x & =& (A{^T}A)^{-1}A{^T}b \end{matrix} \]

Partial 2D affine transform

The partial affine transform mentioned early has a reduced degree of freedom of 4 by excluding shearing leaving only rotation, uniform scaling and translation. How do we do this? We start with the matrices for the transforms we are interested in.

\[ R (rotation) = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta)\end{bmatrix} \]
\[ S(scaling) = \begin{bmatrix} s & 0 \\ 0 & s \end{bmatrix} \]
\[ t (translation) = \begin{bmatrix} tx \\ ty \end{bmatrix} \]

Our partial affine transform is

\[ A = \begin{bmatrix}RS | t\end{bmatrix} \]

Expanding gives

\[ A = \begin{bmatrix} \cos(\theta)s & -\sin(\theta)s & tx \\ \sin(\theta)s & \cos(\theta)s & ty\end{bmatrix} \]

We can rewrite this matrix by defining

\[ a = \cos(\theta)s \]
\[ b = \sin(\theta)s \]
\[ c = tx \]
\[ d = ty \]
\[ A = \begin{bmatrix} a & -b & c \\ b & a & d\end{bmatrix} \]

Solving for [a, b, c, d]

\[ A X = Y \]
\[ \begin{bmatrix} a & -b & c \\ b & a & d \end{bmatrix} \begin{bmatrix} x \\ y \\ 1\end{bmatrix} = \begin{bmatrix} x’ \\ y’ \end{bmatrix} \]
\[ \begin{matrix} {ax- by + c = x’} \\ {bx + ay + d = y’}\end{matrix} \]

Solving for [a, b, c, d]

\[ \begin{bmatrix} x_1 & -y_1 & 1 & 0 \\ y_1 & x_1 & 0 & 1 \\ x_2 & -y_2 & 1 & 0 \\ y_2 & x_2 & 0 & 1 \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \end{bmatrix} = \begin{bmatrix} x’_1 \\ y’_1 \\ x’_2 \\ y’_2 \end{bmatrix} \]

Notice for the partial affine transform we only need 2 pair of points instead of 3.

Final remark

Well, that’s it folks. Hopefully that gives you a better understanding of the 2D affine transform. So when should you use one or the other? I tend to the use the partial affine when I don’t want to overfit because the data has some physical constraint. On the plus side, it’s a bit faster since there are less parameters to solve for. Let me know which one you use for your application! Best answer gets a free copy of OpenCV 3.x 🙂

22 thoughts on “Understanding OpenCV cv::estimateRigidTransform”

  1. Hi Nghia,

    thanks for this nice overview. I was wondering somehow, why you state the partial transform has 4 degrees of freedom. In my understanding, it adds up to 5 degrees of freedom. Moreover, do you have any idea how this function handles the estimation with more points than necessary?

    Thanks,

    Brian

    1. Hi,

      The partial transform (I think also called similarity transform) has [theta, scale, tx, ty]. If you have more than the required points it ends up with a least square solution.

  2. Hi Nghia,
    very helpful blog post! I am trying to code up a live stabilization program in python, on the base of your c++ code. As i use estimateRigidTransform() i see that it has scaling as well. Don’t you think scaling is also changing translation parameters?
    Best Regards, Greg

    1. I haven’t tried scaling actually. I got good result with translation and rotation that I didn’t look further.

  3. The last line of the equations deriving the least squares approximation to the transform should be:

    x = (A^{T} A)^{-1} A^{T} b

    which is

    x = A^{+} b

    where the superscript + denotes the Moore–Penrose pseudoinverse.

  4. Actually is quite subtle why the second derivation works, as it requires the fact that you can always find (theta,s) for any value of a and b that comes out of your solution, such that a = cos(theta)s and b=sin(theta)s. This is not the case if you had the requirement s=1 for instance. Even when s if free (as is the case), you need to show that theta exists such that a*cos(theta) = b*sin(theta) for any a and b (assuming s!=0). This is in fact the case, but it’s not especially trivial IMO.

    1. It is a very interesting post, congratulation Nghia Ho!
      I am interested in the point mentioned by monte_carlo_method. How can I find the best affine estimation considering s=1, i.g the rotation and translation that best match the two set of points? Could you tell me a book or a paper where I can find it?

      Following the second deviation, we have:
      a = cos(\theta) s
      b = sin(\theta) s
      Can we find \theta solving this system like
      \frac{a}{cos(\theta)} = \frac{b}{sin(\theta)},
      \theta = arc tg ( \frac{b}{a}) ?

      Thank you all!

  5. Hello,

    Is there a way to see get the point opencv is using for computing the transformation? Also, is there a way of finding out how opencv is getting this points? is it using a gradient?

    Thanks in advance

        1. calcOpticalFlowPyrLK is only used if you give in an 8bit image as input else it treats them as a set of 2D points.

  6. Thanks for the amazing post!

    I just have one question. There are cases that cv2.estimateRigidTransformation return None value. I do understand that it means “unable to find the transformation matrix”. But I want to know/understand why it happens (e.g. mathematics or whatever it is) . Could you please share your thoughts on this matter? Thanks.

    1. This probably means the matrix was not invertible, possibly because it is under-constrained. For example, if all your points are co-linear then the full 2D affine will fail.

  7. I try to usee the cv.estimateAffinePartial2D , but I always get an error. Someone use it on python and can share is code, please?

  8. There is an error at AX = Y, a multiplication of a 2×3 matrix and 3×1 matrix yields a 2×1 matrix, therefore the matrix Y1,1 is x’ and Y1,2 is y’ and there is no Y1,3. The rest works out, good post.

Leave a Reply

Your email address will not be published. Required fields are marked *