Last Updated on April 26, 2021 by nghiaho12

OpenCV’s estimateRigidTransform is a pretty neat function with many uses. The function definition is

Mat estimateRigidTransform(InputArray src, InputArray dst, bool fullAffine)

The third parameter, fullAffine, is quite interesting. It allows the user to choose between a full affine transform, which has 6 degrees of freedom (rotation, translation, scaling, shearing) or a partial affine (rotation, translation, uniform scaling), which has 4 degrees of freedom. I’ve only ever used the full affine in the past but the second option comes in handy when you don’t need/want the extra degrees of freedom.

Anyone who has ever dug deep into OpenCV’s code to figure out how an algorithm works may notice the following:

Code documentation for the algorithms is pretty much non-existent.
The algorithm was probably written by some soviet Russian theoretical physicists who thought it was good coding practice to write cryptic code that only a maths major can understand.

The above applies to the cv::estimateRigidTransform to some degree. That function ends up calling static cv::getRTMatrix() in lkpyramid.cpp (what the heck?), where the maths is done.

In this post I’ll look at the maths behind the function and hopefully shed some light on how it works.

Full 2D affine transform

The general 2D affine transform has 6 degree of freedoms of the form:

\[ A = \begin{bmatrix} a & b & c \\ d & e & f \end{bmatrix} \]

This transform combines rotation, scaling, shearing, translation and reflection in some cases.

Solving for A requires a minimum of 3 pairing points (that aren’t degenerate!). This is straight forward to do. Let’s denote the input point to be X= [x y 1]^t and the output to be Y = [x’ y’ 1]^t, giving:

\[ AX = Y \]

\[ \begin{bmatrix} a & b & c \\ d & e & f \end{bmatrix} \begin{bmatrix} x \\ y \\ 1\end{bmatrix} = \begin{bmatrix} x’ \\ y’ \\ 1\end{bmatrix} \]

Expanding this gives

\[ \begin{matrix} {ax + by + c = x’} \\ {dx + ey + f = y’}\end{matrix} \]

We can re-write this as a typical Ax = b problem and solve for x. We’ll also need to introduce 2 extra pair of points to be able to solve for x.

\[ \begin{bmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 &x_1 & y_1 & 1 \\ x_2 & y_2 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 &x_2 & y_2 & 1 \\ x_3 & y_3 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 &x_3 & y_3 & 1\end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \end{bmatrix} = \begin{bmatrix} x’_1 \\ y’_1 \\ x’_2 \\ y’_2 \\ x’_3 \\ y’_3 \end{bmatrix} \]

Now plug in your favourite linear solver to solve for [a, b, c, d, e, f]. If you have a 3 or more points, a simple least square solution can be obtained by doing a pseudo-inverse:

\[ \begin{matrix} Ax & = & b \\ A{^T}Ax & = & A^{T}b \\x & =& (A{^T}A)^{-1}A{^T}b \end{matrix} \]

Partial 2D affine transform

The partial affine transform mentioned early has a reduced degree of freedom of 4 by excluding shearing leaving only rotation, uniform scaling and translation. How do we do this? We start with the matrices for the transforms we are interested in.

\[ R (rotation) = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta)\end{bmatrix} \]

\[ S(scaling) = \begin{bmatrix} s & 0 \\ 0 & s \end{bmatrix} \]

\[ t (translation) = \begin{bmatrix} tx \\ ty \end{bmatrix} \]

Our partial affine transform is

\[ A = \begin{bmatrix}RS | t\end{bmatrix} \]

Expanding gives

\[ A = \begin{bmatrix} \cos(\theta)s & -\sin(\theta)s & tx \\ \sin(\theta)s & \cos(\theta)s & ty\end{bmatrix} \]

We can rewrite this matrix by defining

\[ a = \cos(\theta)s \]

\[ b = \sin(\theta)s \]

\[ c = tx \]

\[ d = ty \]

\[ A = \begin{bmatrix} a & -b & c \\ b & a & d\end{bmatrix} \]

Solving for [a, b, c, d]

\[ A X = Y \]

\[ \begin{bmatrix} a & -b & c \\ b & a & d \end{bmatrix} \begin{bmatrix} x \\ y \\ 1\end{bmatrix} = \begin{bmatrix} x’ \\ y’ \end{bmatrix} \]

\[ \begin{matrix} {ax- by + c = x’} \\ {bx + ay + d = y’}\end{matrix} \]

Solving for [a, b, c, d]

\[ \begin{bmatrix} x_1 & -y_1 & 1 & 0 \\ y_1 & x_1 & 0 & 1 \\ x_2 & -y_2 & 1 & 0 \\ y_2 & x_2 & 0 & 1 \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \end{bmatrix} = \begin{bmatrix} x’_1 \\ y’_1 \\ x’_2 \\ y’_2 \end{bmatrix} \]

Notice for the partial affine transform we only need 2 pair of points instead of 3.

Final remark

Well, that’s it folks. Hopefully that gives you a better understanding of the 2D affine transform. So when should you use one or the other? I tend to the use the partial affine when I don’t want to overfit because the data has some physical constraint. On the plus side, it’s a bit faster since there are less parameters to solve for. Let me know which one you use for your application! Best answer gets a free copy of OpenCV 3.x 🙂

22 thoughts on “Understanding OpenCV cv::estimateRigidTransform”

Hi Nghia,

thanks for this nice overview. I was wondering somehow, why you state the partial transform has 4 degrees of freedom. In my understanding, it adds up to 5 degrees of freedom. Moreover, do you have any idea how this function handles the estimation with more points than necessary?

Thanks,

Brian

nghiaho12 says:

April 11, 2016 at 2:11 am

Hi,

The partial transform (I think also called similarity transform) has [theta, scale, tx, ty]. If you have more than the required points it ends up with a least square solution.

Reply
1. Brian Woodwork says:
  
  April 11, 2016 at 9:08 am
  
  Thank you! That makes sense! However, the OpenCV documentation (http://docs.opencv.org/2.4/modules/video/doc/motion_analysis_and_object_tracking.html#estimaterigidtransform) states, that without the flag, there are 5 degrees of freedom left. This is probably a typo, right? However, the estimation still needs three point pairs in that case, which is weird…
  What do you think?
  Thanks,
  Brian
  
  Reply

Hi Nghia,
very helpful blog post! I am trying to code up a live stabilization program in python, on the base of your c++ code. As i use estimateRigidTransform() i see that it has scaling as well. Don’t you think scaling is also changing translation parameters?
Best Regards, Greg

nghiaho12 says:

May 16, 2016 at 11:14 pm

I haven’t tried scaling actually. I got good result with translation and rotation that I didn’t look further.

Reply

The last line of the equations deriving the least squares approximation to the transform should be:

x = (A^{T} A)^{-1} A^{T} b

which is

x = A^{+} b

where the superscript + denotes the Moore–Penrose pseudoinverse.

nghiaho12 says:

November 9, 2016 at 12:34 am

Well spotted!

Reply

Actually is quite subtle why the second derivation works, as it requires the fact that you can always find (theta,s) for any value of a and b that comes out of your solution, such that a = cos(theta)s and b=sin(theta)s. This is not the case if you had the requirement s=1 for instance. Even when s if free (as is the case), you need to show that theta exists such that a*cos(theta) = b*sin(theta) for any a and b (assuming s!=0). This is in fact the case, but it’s not especially trivial IMO.

Matheus_Machado says:

August 14, 2018 at 3:57 pm

It is a very interesting post, congratulation Nghia Ho!
I am interested in the point mentioned by monte_carlo_method. How can I find the best affine estimation considering s=1, i.g the rotation and translation that best match the two set of points? Could you tell me a book or a paper where I can find it?

Following the second deviation, we have:
a = cos(\theta) s
b = sin(\theta) s
Can we find \theta solving this system like
\frac{a}{cos(\theta)} = \frac{b}{sin(\theta)},
\theta = arc tg ( \frac{b}{a}) ?

Thank you all!

Reply

Hello,

Is there a way to see get the point opencv is using for computing the transformation? Also, is there a way of finding out how opencv is getting this points? is it using a gradient?

Thanks in advance

nghiaho12 says:

June 12, 2018 at 11:45 pm

This function uses all the points.

Reply
1. zyd says:
  
  July 5, 2018 at 10:42 pm
  
  No, it use `calcOpticalFlowPyrLK` internally.
  
  Reply
  1. nghiaho12 says:
    
    July 18, 2018 at 12:55 am
    
    calcOpticalFlowPyrLK is only used if you give in an 8bit image as input else it treats them as a set of 2D points.
    
    Reply

I believe there are 4 degrees of freedom if scaling is “uniform,” and there is no shear.

Pingback: Simple real-time video stabilizer with Optical Flow and Kalman filter – ntppro's notes

Thanks for the amazing post!

I just have one question. There are cases that cv2.estimateRigidTransformation return None value. I do understand that it means “unable to find the transformation matrix”. But I want to know/understand why it happens (e.g. mathematics or whatever it is) . Could you please share your thoughts on this matter? Thanks.

nghiaho12 says:

August 12, 2020 at 1:51 pm

This probably means the matrix was not invertible, possibly because it is under-constrained. For example, if all your points are co-linear then the full 2D affine will fail.

Reply

I try to usee the cv.estimateAffinePartial2D , but I always get an error. Someone use it on python and can share is code, please?

nghiaho12 says:

August 23, 2020 at 10:27 pm

Posting your input data might help as well.

Reply

There is an error at AX = Y, a multiplication of a 2×3 matrix and 3×1 matrix yields a 2×1 matrix, therefore the matrix Y1,1 is x’ and Y1,2 is y’ and there is no Y1,3. The rest works out, good post.

nghiaho12 says:

April 26, 2021 at 12:25 pm

Fixed. Thanks!

Reply

Thank you my hero for explaining this for me. Helps a lot

Nghia Ho

Understanding OpenCV cv::estimateRigidTransform

Full 2D affine transform

Partial 2D affine transform

Final remark

22 thoughts on “Understanding OpenCV cv::estimateRigidTransform”

Leave a Reply Cancel reply

Where boredom, free time, and curiosity meet together