Last Updated on April 26, 2021 by nghiaho12
OpenCV’s estimateRigidTransform is a pretty neat function with many uses. The function definition is
Mat estimateRigidTransform(InputArray src, InputArray dst, bool fullAffine)
The third parameter, fullAffine, is quite interesting. It allows the user to choose between a full affine transform, which has 6 degrees of freedom (rotation, translation, scaling, shearing) or a partial affine (rotation, translation, uniform scaling), which has 4 degrees of freedom. I’ve only ever used the full affine in the past but the second option comes in handy when you don’t need/want the extra degrees of freedom.
Anyone who has ever dug deep into OpenCV’s code to figure out how an algorithm works may notice the following:
- Code documentation for the algorithms is pretty much non-existent.
- The algorithm was probably written by some soviet Russian theoretical physicists who thought it was good coding practice to write cryptic code that only a maths major can understand.
The above applies to the cv::estimateRigidTransform to some degree. That function ends up calling static cv::getRTMatrix() in lkpyramid.cpp (what the heck?), where the maths is done.
In this post I’ll look at the maths behind the function and hopefully shed some light on how it works.
Full 2D affine transform
The general 2D affine transform has 6 degree of freedoms of the form:
This transform combines rotation, scaling, shearing, translation and reflection in some cases.
Solving for A requires a minimum of 3 pairing points (that aren’t degenerate!). This is straight forward to do. Let’s denote the input point to be X= [x y 1]^t and the output to be Y = [x’ y’ 1]^t, giving:
Expanding this gives
We can re-write this as a typical Ax = b problem and solve for x. We’ll also need to introduce 2 extra pair of points to be able to solve for x.
Now plug in your favourite linear solver to solve for [a, b, c, d, e, f]. If you have a 3 or more points, a simple least square solution can be obtained by doing a pseudo-inverse:
Partial 2D affine transform
The partial affine transform mentioned early has a reduced degree of freedom of 4 by excluding shearing leaving only rotation, uniform scaling and translation. How do we do this? We start with the matrices for the transforms we are interested in.
Our partial affine transform is
Expanding gives
We can rewrite this matrix by defining
Solving for [a, b, c, d]
Solving for [a, b, c, d]
Notice for the partial affine transform we only need 2 pair of points instead of 3.
Final remark
Well, that’s it folks. Hopefully that gives you a better understanding of the 2D affine transform. So when should you use one or the other? I tend to the use the partial affine when I don’t want to overfit because the data has some physical constraint. On the plus side, it’s a bit faster since there are less parameters to solve for. Let me know which one you use for your application! Best answer gets a free copy of OpenCV 3.x 🙂
Hi Nghia,
thanks for this nice overview. I was wondering somehow, why you state the partial transform has 4 degrees of freedom. In my understanding, it adds up to 5 degrees of freedom. Moreover, do you have any idea how this function handles the estimation with more points than necessary?
Thanks,
Brian
Hi,
The partial transform (I think also called similarity transform) has [theta, scale, tx, ty]. If you have more than the required points it ends up with a least square solution.
Thank you! That makes sense! However, the OpenCV documentation (http://docs.opencv.org/2.4/modules/video/doc/motion_analysis_and_object_tracking.html#estimaterigidtransform) states, that without the flag, there are 5 degrees of freedom left. This is probably a typo, right? However, the estimation still needs three point pairs in that case, which is weird…
What do you think?
Thanks,
Brian
Hi Nghia,
very helpful blog post! I am trying to code up a live stabilization program in python, on the base of your c++ code. As i use estimateRigidTransform() i see that it has scaling as well. Don’t you think scaling is also changing translation parameters?
Best Regards, Greg
I haven’t tried scaling actually. I got good result with translation and rotation that I didn’t look further.
The last line of the equations deriving the least squares approximation to the transform should be:
x = (A^{T} A)^{-1} A^{T} b
which is
x = A^{+} b
where the superscript + denotes the Moore–Penrose pseudoinverse.
Well spotted!
Actually is quite subtle why the second derivation works, as it requires the fact that you can always find (theta,s) for any value of a and b that comes out of your solution, such that a = cos(theta)s and b=sin(theta)s. This is not the case if you had the requirement s=1 for instance. Even when s if free (as is the case), you need to show that theta exists such that a*cos(theta) = b*sin(theta) for any a and b (assuming s!=0). This is in fact the case, but it’s not especially trivial IMO.
It is a very interesting post, congratulation Nghia Ho!
I am interested in the point mentioned by monte_carlo_method. How can I find the best affine estimation considering s=1, i.g the rotation and translation that best match the two set of points? Could you tell me a book or a paper where I can find it?
Following the second deviation, we have:
a = cos(\theta) s
b = sin(\theta) s
Can we find \theta solving this system like
\frac{a}{cos(\theta)} = \frac{b}{sin(\theta)},
\theta = arc tg ( \frac{b}{a}) ?
Thank you all!
Hello,
Is there a way to see get the point opencv is using for computing the transformation? Also, is there a way of finding out how opencv is getting this points? is it using a gradient?
Thanks in advance
This function uses all the points.
No, it use `calcOpticalFlowPyrLK` internally.
calcOpticalFlowPyrLK is only used if you give in an 8bit image as input else it treats them as a set of 2D points.
I believe there are 4 degrees of freedom if scaling is “uniform,” and there is no shear.
Thanks for the amazing post!
I just have one question. There are cases that cv2.estimateRigidTransformation return None value. I do understand that it means “unable to find the transformation matrix”. But I want to know/understand why it happens (e.g. mathematics or whatever it is) . Could you please share your thoughts on this matter? Thanks.
This probably means the matrix was not invertible, possibly because it is under-constrained. For example, if all your points are co-linear then the full 2D affine will fail.
I try to usee the cv.estimateAffinePartial2D , but I always get an error. Someone use it on python and can share is code, please?
Posting your input data might help as well.
There is an error at AX = Y, a multiplication of a 2×3 matrix and 3×1 matrix yields a 2×1 matrix, therefore the matrix Y1,1 is x’ and Y1,2 is y’ and there is no Y1,3. The rest works out, good post.
Fixed. Thanks!
Thank you my hero for explaining this for me. Helps a lot