Understanding OpenCV cv::estimateRigidTransform

OpenCV’s estimateRigidTransform is a pretty neat function with many uses. The function definition is

Mat estimateRigidTransform(InputArray src, InputArray dst, bool fullAffine)

The third parameter, fullAffine, is quite interesting. It allows the user to choose between a full affine transform, which has 6 degrees of freedom (rotation, translation, scaling, shearing) or a partial affine (rotation, translation, uniform scaling), which has 4 degrees of freedom. I’ve only ever used the full affine in the past but the second option comes in handy when you don’t need/want the extra degrees of freedom.

Anyone who has ever dug deep into OpenCV’s code to figure out how an algorithm works may notice the following:

  • Code documentation for the algorithms is pretty much non-existent.
  • The algorithm was probably written by some soviet Russian theoretical physicists who thought it was good coding practice to write cryptic code that only a maths major can understand.

The above applies to the cv::estimateRigidTransform to some degree. That function ends up calling static cv::getRTMatrix() in lkpyramid.cpp (what the heck?), where the maths is done.

In this post  I’ll look at the maths behind the function and hopefully shed some light on how it works.

Full 2D affine transform

The general 2D affine transform has 6 degree of freedoms of the form:

T = \left[ \begin{array}{ccc} a & b & c \\ d & e & f \end{array} \right]

This transform combines rotation, scaling, shearing, translation and reflection in some cases.

Solving for T requires a minimum of 3 pairing points (that aren’t degenerate!). This is straight forward to do. Let’s denote the input point to be X= [x y 1] and the output to be Y = [x’ y’ 1], giving:

TX = Y

\left[ \begin{array}{ccc} a & b & c \\ d & e & f \end{array} \right] \left[\begin{array}{c} x \\ y \\ 1\end{array}\right] = \left[\begin{array}{c} x' \\ y' \\ 1\end{array}\right]

Expanding this gives

\begin{array}{c} {ax + by + c = x'} \\ {dx + ey + f = y'}\end{array}

We can re-write this as a typical Ax = b matrix and solve for x. We’ll also need to introduce 2 extra pair of points to be able to solve for x.

\left[\begin{array}{cccccc} x_1 & y_1 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 &x_1 & y_1 & 1 \\ x_2 & y_2 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 &x_2 & y_2 & 1 \\ x_3 & y_3 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 &x_3 & y_3 & 1\end{array}\right]\left[\begin{array}{c} a \\ b \\ c \\ d \\ e \\ f \end{array} \right] = \left[\begin{array}{c} x_1 \\ y_1 \\ x_2 \\ y_2 \\ x_3 \\ y_3 \end{array} \right]

Now plug in your favourite linear solver to solve for [a, b, c, d, e, f].

If you have more than 3 pair of points then you can do least squares by doing:

\begin{array}{ccc} Ax & = & b \\ A{^T}Ax & = & A^{T}b \\x & =& (A{^T}A)^{-1}A{^T}b \end{array}

Partial 2D affine transform

The partial affine transform mentioned early has a reduced degree of freedom of 4 by excluding shearing leaving only rotation, uniform scaling and translation. How do we do this? We start with the matrices for the transforms we are interested in.

R (rotation) = \left[\begin{array}{cc} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta)\end{array}\right]

S (scaling) = \left[\begin{array}{cc} s & 0 \\ 0 & s \end{array}\right]

t (translation) = \left[\begin{array}{c} tx \\ ty \end{array}\right]

Our partial affine transform is

T = \left[RS | t \right]

Expanding gives

T = \left[ \begin{array}{ccc} \cos(\theta)s & -\sin(\theta)s & tx \\ \sin(\theta)s & \cos(\theta)s & ty\end{array} \right]

We can rewrite this matrix by defining

a = \cos(\theta)s

b = \sin(\theta)s

c = tx

d = ty

T = \left[ \begin{array}{ccc} a & -b & c \\ b & a & d\end{array} \right]

Solving for [a, b, c, d]

TX = Y

\left[ \begin{array}{ccc} a & -b & c \\ b & a & d \end{array} \right] \left[\begin{array}{c} x \\ y \\ 1\end{array}\right] = \left[\begin{array}{c} x' \\ y' \\ 1\end{array}\right]

\begin{array}{c} {ax- by + c = x'} \\ {bx + ay + d = y'}\end{array}

Solving for [a, b, c, d]

\left[\begin{array}{cccc} x_1 & -y_1 & 1 & 0 \\ y_1 & x_1 & 0 & 1 \\ x_2 & -y_2 & 1 & 0 \\ y_2 & x_2 & 0 & 1 \end{array}\right]\left[\begin{array}{c} a \\ b \\ c \\ d \end{array} \right] = \left[\begin{array}{c} x_1 \\ y_1 \\ x_2 \\ y_2 \end{array} \right]

Notice for the partial affine transform we only need 2 pair of points instead of 3.

Final remark

Well, that’s it folks. Hopefully that gives you a better understanding of the 2D affine transform. So when should you use one or the other? I tend to the use the partial affine when I don’t want to overfit because the data has some physical constraint. On the plus side, it’s a bit faster since there are less parameters to solve for. Let me know which one you use for your application! Best answer gets a free copy of OpenCV 3.x đŸ™‚