OpenCV’s estimateRigidTransform is a pretty neat function with many uses. The function definition is

Mat `estimateRigidTransform`(InputArray **src**, InputArray **dst**, bool **fullAffine**)

The third parameter, **fullAffine**, is quite interesting. It allows the user to choose between a full affine transform, which has 6 degrees of freedom (rotation, translation, scaling, shearing) or a partial affine (rotation, translation, uniform scaling), which has 4 degrees of freedom. I’ve only ever used the full affine in the past but the second option comes in handy when you don’t need/want the extra degrees of freedom.

Anyone who has ever dug deep into OpenCV’s code to figure out how an algorithm works may notice the following:

- Code documentation for the algorithms is pretty much non-existent.

- The algorithm was probably written by some soviet Russian theoretical physicists who thought it was good coding practice to write cryptic code that only a maths major can understand.

The above applies to the cv::estimateRigidTransform to some degree. That function ends up calling static cv::getRTMatrix() in lkpyramid.cpp (what the heck?), where the maths is done.

In this postÂ I’ll look at the maths behind the function and hopefully shed some light on how it works.

# Full 2D affine transform

The general 2D affine transform has 6 degree of freedoms of the form:

This transform combines rotation, scaling, shearing, translation and reflection in some cases.

Solving for T requires a minimum of 3 pairing points (that aren’t degenerate!). This is straight forward to do. Let’s denote the input point to be X= [x y 1] and the output to be Y = [x’ y’ 1], giving:

Expanding this gives

We can re-write this as a typical A**x** = b matrix and solve for **x**. We’ll also need to introduce 2 extra pair of points to be able to solve for **x**.

Now plug in your favourite linear solver to solve for [a, b, c, d, e, f].

If you have more than 3 pair of points then you can do least squares by doing:

# Partial 2D affine transform

The partial affine transform mentioned early has a reduced degree of freedom of 4 by excluding shearing leaving only rotation, uniform scaling and translation. How do we do this? We start with the matrices for the transforms we are interested in.

Our partial affine transform is

Expanding gives

We can rewrite this matrix by defining

Solving for [a, b, c, d]

Solving for [a, b, c, d]

Notice for the partial affine transform we only need 2 pair of points instead of 3.

# Final remark

Well, that’s it folks. Hopefully that gives you a better understanding of the 2D affine transform. So when should you use one or the other? I tend to the use the partial affine when I don’t want to overfit because the data has some physical constraint. On the plus side, it’s a bit faster since there are less parameters to solve for. Let me know which one you use for your application! Best answer gets a free copy of OpenCV 3.x đź™‚

Hi Nghia,

thanks for this nice overview. I was wondering somehow, why you state the partial transform has 4 degrees of freedom. In my understanding, it adds up to 5 degrees of freedom. Moreover, do you have any idea how this function handles the estimation with more points than necessary?

Thanks,

Brian

Hi,

The partial transform (I think also called similarity transform) has [theta, scale, tx, ty]. If you have more than the required points it ends up with a least square solution.

Thank you! That makes sense! However, the OpenCV documentation (http://docs.opencv.org/2.4/modules/video/doc/motion_analysis_and_object_tracking.html#estimaterigidtransform) states, that without the flag, there are 5 degrees of freedom left. This is probably a typo, right? However, the estimation still needs three point pairs in that case, which is weird…

What do you think?

Thanks,

Brian

Hi Nghia,

very helpful blog post! I am trying to code up a live stabilization program in python, on the base of your c++ code. As i use estimateRigidTransform() i see that it has scaling as well. Don’t you think scaling is also changing translation parameters?

Best Regards, Greg

I haven’t tried scaling actually. I got good result with translation and rotation that I didn’t look further.

The last line of the equations deriving the least squares approximation to the transform should be:

x = (A^{T} A)^{-1} A^{T} b

which is

x = A^{+} b

where the superscript + denotes the Mooreâ€“Penrose pseudoinverse.

Well spotted!