In 2011, some time around April, I had a motorcycle accident which left me with a broken right hand. For the next few weeks I was forced to take public transport to work, *shudder* (it takes me twice as long to get to work using Melbourne’s public transport than driving). I killed time on the train reading comic books and academic papers. One interesting one I came across was Five-Point Algorithm Made Easy by Hongdong Li and Richard Hartley for calculating the essential matrix using five points (minimum). Now here is something you don’t see everyday in an academic paper title, **‘easy’**. I’ve always be meaning to get around to learning the five-point algorithm and this one boasts an easier implementation than David Nister’s version. Seemed like the perfect opportunity to try it out, I mean how hard could it be? Well …

Over **one year later** of on and off programming I finally finished implementing this damn algorithm! A fair bit of time was spent on figuring out how to implement symbolic matrices in C++. I was up for using existing open source symbolic packages but found that they all struggled to calculate the determinant of a 10×10 symbolic matrix of polynomials. If you do the maths, doing it the naive way is excruciatingly slow, at least in the orders of 10 factorial. I ended up writing my own simple symbolic matrix class. I also wrote a Maxima and PHP script to generate parts of the symbolic maths step into C++ code, fully expanded out in all its glory. The end result is a rather horrendous looking header file with 202 lines of spaghetti code (quite proud of that one actually).

The second major road black was actually a one liner bug in my code. I was doing the SVD of a 5×9 matrix in OpenCV. By default OpenCV’s SVD doesn’t do the full decomposition for Vt unless you tell it to eg. cv::SVD svd(F, cv::SVD::FULL_UV). So I was using the 5×9 version of Vt instead of 9×9 to grab the null basis, which was incorrect.

It was quite a hellish journey, but in the end I learnt something new and here’s the code for you all to enjoy.

# Download

Last update: 25th August 2013

5point-0.1.5.tar.gz (requires OpenCV)

5Point-Eigen-0.1.5.tar.gz (requires Eigen)

The OpenCV and Eigen version produce identical output. The Eigen version was ported by Stuart Nixon. The file 5point.cpp in the Eigen version has a special #ifdef ROBUST_TEST that you can turn on to enable a more robust test against outliers. I have not thoroughly test this feature so it is turned off by default.

Given 5 or more points, the algorithm will calculate all possible essential matrix solutions and return the correct one(s) based on depth testing. As a bonus it even returns the 3×4 projection matrix. The whole process takes like 0.4 milliseconds on my computer. There’s a lot of room for speed improvement using more efficient maths. David Nister’s paper list some improvements in the appendix.

Thank you very much for your code, Ill try it right now.

Hi thank you for contributing such a nice code.

I think I caught a severe bug in your code.

in 5point.cpp, there is a line like

pt3d = TriangulatePoint(cv::Point2d(pts1[0], pts1[1]), cv::Point2d(pts2[0], pts2[2]), P_ref, P[j]);

I suspect it should be

pt3d = TriangulatePoint(cv::Point2d(pts1[0], pts1[1]), cv::Point2d(pts2[0], pts2[1]), P_ref, P[j]);

The difference is the second index of pts2

Hi,

Well spotted! Fixed now.

Hi,

trying to use your functions.

I noticed that when you create the projection matrices you do not check if the rotation matrix is a mirroring rotation matrix (det < 0). Is that something you have though about?

Hi,

Good point totally forgot about that. I’ll update the code now.

I’m flicking through the multiple view geometry book and it doesn’t seem to bring up the issue. Are you seeing det < 0 ?

This is the result I get when compiling your program with the test points in the main.cpp, octave shows that the return P is a mirroring projection matrix:

P =

-0.30440 0.16280 -0.93853 0.44225

0.93299 0.24959 -0.25931 -0.51762

-0.19203 0.95457 0.22786 -0.73245

octave:2> R = P(:,1:3)

R =

-0.30440 0.16280 -0.93853

0.93299 0.24959 -0.25931

-0.19203 0.95457 0.22786

octave:3> det(R)

ans = -1.00000

octave:4>

Be sure to check out five-point.cpp in the master branch of opencv;

if (determinant(U) < 0) U *= -1.;

if (determinant(Vt) < 0) Vt *= -1.;

I’ve used this version in the past with success

if determinant(R) < 0

multiply 3rd column of R by -1

end if

HI

Thank you very much for your implementation because i am trying to convert my application from Matlab to OpenCV. In Matlab version, i used code from

http://vis.uky.edu/~stewe/FIVEPOINT/

However, the result might not exactly the same.

When i executed your sample (num_pts = 5), the result is

E=[

1.0161 -0.4041 -0.3996

0.3996 -0.7028 0.7615

0.3310 0.2526 -0.7795

]

Is this result 100% correct?

Besides, what does num_pts mean?

How it will influence the result?

Moreover, i tried to solve your sample with Nister’s code.

The result is

E =

0.0553 -0.6078 0.3496

0.3687 -0.0010 0.0642

-0.5916 -0.1183 -0.0360

and it needs 10 points.

I don’t quite understand the difference or the difference could be ignored?

num_pts is “number of points”. My sample code has 10 pts but I tested it on the first 5 just to confirm it works with the bare minimum. I like to think my code is solid, keyword “think” There does seem to be a big difference in the E matrix. How many E matrix does their method return?

You implementation are different from the 5 point algorithm in bundler too. The result from bundler are similar to Nister’s code,but much different from you. So I think your code may have some bugs.

Can you post the data and results so I can compare?

If I want to use your code into my solution then do I need to change only the corresponding points?

5point.mac will work as it is or I need to modify that?

You only need to change the corresponding points, don’t need to touch 5point.mac.

Does there any way to get the Translation Matrix (t) Rotatinal Matrix (R) from the output of your code?

It returns a 3×4 projection matrix which has what you want.

Is the output gives the rotational and translation matrix for pts2 (2nd camera position)?

No, the 3×4 projection matrix transforms a 3D point so that it’s relative to the camera’s view. If you want the camera’s position the formula is -R’ * t . Where R is the first 3×3 from the 3×4 matrix, t is the 4th column.

The formulas can be found here http://www.cs.cornell.edu/~snavely/bundler/bundler-v0.4-manual.html.

Does the code calculate P and E from only the corresponding points? Does this mean we can retrieve the calibration matrix K from decomposing P calculated from the 5 point algorithm?

The algorithm assumes you give it normalised image points, meaning you are expected to know K before hand already. The P matrix returned does not include K.

Hello,

Me too, i think that you have some bugs in your implementation of 5 pts algorithm because I compared the result of your code with the 5 points implementation in Matlab and it was not same..

Can you send me some data so I can debug it?

Hi, thanks for your code!!

But I have found an issue when choosing the Projection matrix with the most points in front of the camera, which could be why some people find a bug.

In fact in Opencv when you do:

ret_E = E;

and

ret_P = P[j];

it puts only pointers on these fields, and the value can update itself while you don’t want to. I replaced it by:

E.copyTo(ret_E);

and

P[j].copyTo(ret_P);

And it works much better!!!

It might be something that has changed with the newer versions of Opencv.

Thanks again for your awesome work!!

Bye

Ah that silly matrix problem gets me every time in OpenCV! Thanks for spotting that one. Updated the code now.

On a more general note I’ve added a print out when it detects a reflection in the rotation matrix. Can someone confirm I’m doing the right thing to fix it. Else I’ll leave the annoying print out until it gets resolved

Hey again,

About the reflections I had this issue too, especially with rotation but with translation too sometimes!

I still think that there is a possible inversion in the Essential matrix, because the ProjectionsFromEssential function is correct.

For the moment I just use heuristics, if you find the solution I am fan!

Otherwise I was wondering:

For the moment when you choose between the Essential matrices to have the best one, you pick the one that leads to a projection matrix leaving the most points in front of the camera. But with the possibles reflection, this method can sometimes fail: I printed out all the projections matrices with ground truth known, and the right one was discarded.

I still don’t know how to solve this in a general fashion, I use heuristics for my problem and then pick the best one. An idea could be to minimize of form of criterion rather than just “number in front of the camera” but I can’t figure out a good way to do it except Sampson error…

I’ll keep you informed if I find a better way to set this up!!

Hey,

I wonder if it’s possible to apply the rotation “correction” prior to doing the test to find the best one. That way you’re always testing valid matrices.

Are you taking the camera calibration parameters, i mean the K matrix ?

I refer to K as the 3×3 intrinsic camera matrix, these are the calibrated parameters. This algorithm does not find or return K.

I have 2 different cameras with their intrinsic parameters (2 different K matrix). Also I have correspondence points from both the cameras. Now to use this code where should I initialize these intrinsic parameters (K matrix for both cameras)?

Apply them to the points before running the code eg.

ptn1 = K1*pt1

ptn2 = K2*pt2

ptn1, ptn2 becomes input to the code

Correction, K1 should be K1^(-1) (inverse of K1), same for K2.

how should I calculate inverse(K1)*pt1?

inverse(K1) is a 3×3 matrix and pt1 is in of form (x,y) values. will it be a matrix multiplication? I guess this is not a scalar multiplication..

Shall it be like below code..

Point3d u1(pt1.x, pt1.y, 1.0);

Mat_ um1 = K1inv * Mat_(u1);

u1.x = um1(0); u1.y = um1(1); u1.z = um1(2);

ptn1[] = {u1.x/u1.z, u1.y/u1.z};

The 2D point first converted to a 3D point with z value 1 and then after multiply and get back the 2D by x/z and y/z

This is correct. The division is redundant because it’ll be dividing by 1 all the time.

One more question.

I have a set of around 1000 correspondence points between 2 images. With different set of 5 points out of these 1000 points, I get different output of E and P. How do you suggest to take an average output of these?

Use the RANSAC framework.

1. Randomly pick a set of 5 points

2. Calculate E

3. Apply E to all the points and see how well they “fit” (up to you to define, check Sampson distance)

4. Keep track of which E produces the best fit

5. Repeat until satisfied

You can do one extra step and refine it further by taking all the points that “fit” well and calculate E using all the points (instead of only 5), this will find an average E.

Can you please explain how to apply the E to all other points?

And how to calculate E using all “fit” points..

My function already handles using points to find E:

bool Solve5PointEssential(double *pts1, double *pts2, int num_pts, cv::Mat &ret_E, cv::Mat &ret_P, int &best_inliers)

set num_pts to how many points you have.

I’m not having the same results as Stewenius matlab version (http://vis.uky.edu/~stewe/FIVEPOINT/) for this dataset :

-1.84061 -2.66844

-0.907628 -2.73298

-1.59856 -2.32285

-2.20069 -2.7138

-1.41456 -2.11016

-1.22827 -2.35542

-1.45023 -2.11384

-1.55311 -2.32163

-1.78152 -2.57716

-2.24741 -2.19773

-1.0908 -2.26306

-1.77424 -1.77768

-1.45032 -2.31666

-0.839 -2.44396

-1.71278 -2.31398

-1.66077 -2.34513

-1.64307 -2.4623

-1.46829 -2.35682

-1.17333 -2.3116

-1.09173 -2.63488

The points are already normalized. Do you experience the same results ?

Nevermind… again… managed to make it work.

Thank you for your great work.

How did you fix it?

Hello,

I’m just looking at your 5 point code – looks very promising. Thanks for releasing this. Just for your information, I’m getting the following gcc compile warning, which I’ve yet to look into.

src/5point/Rpoly.cpp: In function ‘void RealIT_ak1(int*, int*, double*, int, double*, int, double*, double*, double*, double*, double*)':

src/5point/Rpoly.cpp:660:13: warning: ‘omp’ may be used uninitialized in this function [-Wuninitialized]

src/5point/Rpoly.cpp: In function ‘void QuadIT_ak1(int, int*, double, double, double*, double*, double*, double*, double*, int, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*)':

src/5point/Rpoly.cpp:569:13: warning: ‘omp’ may be used uninitialized in this function [-Wuninitialized]

It should be a harmless warning because omp gets used after it is initialised. Just follow the execution path.

Nghia,

As you note, the initialized value warnings are OK; I’ve set the values to 0 to shut the warnings up.

I’ve been testing your 5 point code using RANSAC and comparing with a normalized 8 point algorithm with known correct test data. I’ve run into some issues, and have yet to sort them out. I’m happy to send you some test data and further comments; perhaps you could contact me via email (assuming you can access my email address I used to post this message).

Some of the issues I’ve found are that:

– The reflection test code is not quite right in my test case – one needs to invert 00, 01, 10, 11 and 22 values in the R matrix to get the correct results for my test data.

– The 3rd value (Z) in the translation is inverted from the correct value (as extracted from the P matrix).

– I am not getting the correct E matrix out so RANSAC is failing.

– The polynomial code sometimes reports unable to converge after 20 shifts (perhaps because of bad random points selected for RANSAC).

I suspect the 2nd issue (inverted Z value for translation in P) is also present in the E matrix, as I’m still not getting correct E matrix values out. I’ll continue to dig into this, but it would be helpful to continue this discussion via email and to give you some test data to see what you think of it all.

I’m interested in resolving these problems, as your code is clean and fast.

Kind regards,

Stuart

Did you enter a legitimate email address? Your email domain does not exist.

Hi. That should be a valid email domain, but try the one I used to post this message instead.

I’m beginning to wonder if the problem might be related to RHS vs LHS coordinate system (I use photogrammetry style right-handed coordinate system not OpenCV style left handed coordinates). I’m building test cases to try and narrow the problem down. I’m also looking to swap your code over to Eigen3 for matrix work, which should make it even faster and remove the OpenCV dependancy (I’m leaving that until I get things working as expected first). Be good to continue discussions via email to chat about the issues I’m seeing and to provide you with some test data via email.

Cheers,

Stuart

Hi, I test your code for 2 pictures (of redskirt dataset).

I manually select 8 matching:

double pts1[] = {548, 119,

535,276,

615,330,

596,283,

585,175,

578,662,

706,313,

460,260

};

double pts2[] = {507,110,

491,263,

556,326,

547,277,

530,168,

527,653,

644,321,

462,242

};

This is the result:

Average execution time: 717.139 us

Success! Found 2 possible solutions

The best one has the highest inliers. An inlier is a point that is in front of both cameras.

Solution 1/2

E = [-0.001483281169646533, 0.0798129447617992, -16.00316603240302;

-0.0728808162683079, 0.0001959859646169806, 50.90835405921668;

15.58680315258395, -51.03736078741671, -0.002534495378497037]

P = [0.9999666308722415, -0.008168190689744725, -0.0001334274124399082, -0.9539744194915745;

0.008168191808671788, 0.9999666397340765, 7.843250090652126e-06, -0.2998842742424013;

0.0001333588961035814, -8.932849065581964e-06, 0.9999999910678045, -0.001492989558391173]

inliers = 7/8

=========================================

Solution 2/2

E = [-0.5049975712191607, 0.9597433748533721, 136.6575441673005;

0.2489707601260553, -0.02233064866215852, -313.4228545544773;

131.6783043503389, -315.5458153141706, 0.5385152633007909]

Detected a reflection in P. Multiplying all values by -1.

P = [-0.6920270232728676, -0.7218672225694188, 0.002512377361682321, 0.9166507454123589;

-0.7218704435458069, 0.6920279437574728, -0.0006227309082056209, 0.3996796151553508;

-0.001289106308433324, -0.002244557577137604, -0.9999966500774935, 0.002759739885880337]

inliers = 8/8

=========================================

dataset’s given projection matrixes for these images:

1.000000 0.000000 0.000000 0.000000

0.000000 1.000000 0.000000 0.000000

0.000000 0.000000 1.000000 0.000000

0.954262 -0.140546 0.263872 -6.723805

0.144974 0.989431 0.002719 -0.643127

-0.261466 0.035661 0.964552 1.309906

would you pls help me find out what’s wrong with it?

You need to normalise your input values using the 3×3 K camera matrix first.

thanks for reply, I’ve normalized points using K: as below

K * [x y 1]’

K = [891.913000 2.962990 507.609000

0.0 894.045000 379.563000

0.0 0.0 1.0]

these are my new points:

double pts1[] = {532777.943232259,144497.667369159,

547476.696029767,287099.237462617,

476204.159187399,244318.766434579,

534573.913560981,257152.907742991,

579932.187116122,207242.358210280,

437656.496297928,202964.311107477,

518613.145600857,592266.597462617,

424507.876486838,528095.890920561};

double pts2[] = {489204.057059098,130628.554880249,

493894.040760498,283037.472479005,

439792.043870918,237457.235440125,

488120.402290824,250276.677107310,

541961.852493002,203272.057660964,

422623.448063764,193301.380808709,

473438.404953344,585006.542861586,

399355.148617418,505241.128043546};

when I run it, it Could not find a valid essential matrix. why!?

Multiply by inverse of K.

How distorted is your camera? Most calibrations have a skew of 0 at K(1,2), you got 2.962.

thanks, I’ve normalized points using inv(K).

but I get different result for E and P from your code and Nistér code.

normalized points:

double pts1[] = {0.0912160582538460,-0.237107884256455,

-0.00231531248372204,-0.0747598844963950,

0.131778189575707,-0.0658396647293588,

0.0768606206761283,0.315945741299793,

0.101128023940679,0.00909018131374581,

-0.0319772746917338,0.241015895256689,

0.0309987576756805,0.125053038285217,

0.230164789187762,-0.0747598844963950};

double pts2[] = {0.0460994198507981,-0.251058163244206,

-0.0293170010246510,-0.101492930055461,

0.0777373608926701,-0.0730043142099859,

0.0356087889251775,0.302689307252219,

0.0721846530087127,-0.00356331308664002,

-0.0534139092718004,0.211881844244767,

0.00402612144282850,0.105049534824234,

0.170592510940855,-0.074784852700328};

——————————————————————————–

E from nister code:

E

0.0130 -0.5186 -0.2433

0.5335 0.0176 -0.4023

0.2246 0.4175 0.0152

determinant constraint

-6.3786e-016

trace constraint

1.0e-014 *

-0.3792 -0.1887 0.3025

-0.1887 0.4805 0.3109

0.3136 0.3109 0.0243

reprojection errors

0.0011

-0.0019

-0.0009

-0.0006

0.0023

0.0001

0.0009

-0.0009

E

-0.0106 0.1304 0.0865

-0.2081 -0.0313 -0.6695

-0.1939 0.6672 0.0122

determinant constraint

-2.1357e-015

trace constraint

1.0e-013 *

0.0444 0.0289 0.0225

0.0447 -0.1676 -0.0333

0.0519 -0.0233 0.1656

reprojection errors

-0.0003

-0.0007

0.0034

0.0000

-0.0018

-0.0000

0.0004

-0.0010

——————————————————————————–

your code result:

Success! Found 2 possible solutions

The best one has the highest inliers. An inlier is a point that is in front of both cameras.

Solution 1/2

E = [0.01148409963975677, 0.2256736888650271, 0.2102266638882955;

-0.1414274834444046, 0.03390718827838227, -0.7235860545601634;

-0.0937650329119874, 0.7260337842465426, -0.01320226081599484]

Detected a reflection in P. Multiplying all values by -1.

P = [0.9806301797531383, -0.1578850635032568, 0.1159170275667039, -0.9154292901533225;

0.1571445712945679, 0.9874530609946233, 0.01555750767299127, -0.2713866487172902;

-0.1169189217790431, 0.002959570056834053, 0.9931370533189793, 0.2972179362477677]

inliers = 8/8

Solution 2/2

E = [-0.03591391645196601, -1.477446351363468, -0.6218820848126339;

1.436193382639371, -0.04882832844378427, -1.156326639471349;

0.6737905241157682, 1.11404943074441, -0.04208835387859701]

Detected a reflection in P. Multiplying all values by -1.

P = [0.9996397798815372, -0.01490468428539671, -0.02231951757418038, -0.5741231241023349;

0.0143598572020866, 0.999599751239123, -0.02437481946228576, 0.3359458978504272;

0.02267388320352681, 0.02404553407674684, 0.9994537044362969, -0.7466746226358877]

inliers = 7/8

——————————————————————————–

dataset projections matrixes mentioned above.

I’ve no idea of which one would be correct?

also would you tell me a bit about deep test?

The E matrix is determined up to an unknown scale. Divide all the E matrices by E(3,3) to get a better comparison. I get nearly identical results with Nister’s except the matrix is a transpose of mine (no idea why).

yes that’s right. thanks.

to compute projections matrix, it gives us 4 answer that one of them is correct.

M1 = R1 * [I T];

M2 = R2 * [I -1*T];

M3 = R1 * [I T];

M4 = R2 * [I -1*T];

which one is correct? one way is to use each Mi and reconstruct matching points using them and check if z has a positive value. (that show us a 3d point infront of camera)

when I implement this way, it removes 2 wrong answers, but one wrong answer remains.

Is there any way else?

also I’ve got same result as you between my possible answers for projection matrixes.

but our result is not same as dataset given one. any idea why it happened? (I’ve normalized all of them using p(3,4) to compare)

I don’t see the dataset projection matrices you mentioned in the previous post.

dataset’s given projection matrixes for these images:

1.000000 0.000000 0.000000 0.000000

0.000000 1.000000 0.000000 0.000000

0.000000 0.000000 1.000000 0.000000

0.954262 -0.140546 0.263872 -6.723805

0.144974 0.989431 0.002719 -0.643127

-0.261466 0.035661 0.964552 1.309906

Not sure on this one. It seems the translation has been scaled, the 4th column usually has a length of 1 for unknown scale. Do you have a reference for the dataset?

I upload for you calibration params of dataset:

http://5.9.197.46/static/calibParams.txt

also these are pics I’ve working with (camera 0 ,1 frame 40)

http://5.9.197.46/static/lybe0_40.bmp

http://5.9.197.46/static/lybe1_40.bmp

There seems to be two distortion coefficents in the text file. Did you take them into consideration when normalising the points?

oh really?

dataset doesn’t have any documentation and I couldnt understand it. what are these distortion coefficents and how can I consider them in normalizing points. thank you very much for your time …..

Have a look at http://www.cs.cornell.edu/~snavely/bundler/bundler-v0.4-manual.html

It’s one possibility for the distortion.

From my correspondence with another person who did some research on this I think the conclusion is there’s not obvious way to resolve all ambiguities. It’s one of those degenerate problems. You’ll probably have to resort to some heuristic.

hi!, i don’t understand the meaing Of symbolic maths ‘M’,in Mblock.h .I also don’t konw how you calculate the determinant. could you give some explain . thanks!

The M is an arbitrary name. It’s the result of a expanding the symbolic matrix conditions. See the original paper to see what I mean. I made a reference to the determinant calculation in README.txt.

in Mblock.h, the element of M matrix only include e** which is the result of SVD . In the coefficients matrix M , I don’t find the hidden variable z.

The determinant calculation kicks in around line 99 in 5point.cpp. I wrote a class called PolyMatrix, to store the symbolic matrix coefficients.

With two constraints ,it can get 10 equations about x y z. The max power is 3. why regard x^3, y^3, x^2*y, …,x,y as Linear element, compute though determinant and SVD ,instead of linearized equation. Thanks!

Hi,

I’m not entirely sure what you are asking, but there’s a good chance I won’t be able to answer your question either. I simply just followed the instructions outlined in the original paper. I don’t know too much about the underlying theories behind it. The paper might answer your question, it’s only 4 pages.

The file ‘5point-Eigen-0.1.5/5point/5point.h’ seems incorrect.

It includes .

It includes opencv2/core/core.hpp.

Can not find definitions for EMatrix and PMatrix type..

Fixed now. A bad package.

I have one short question and I’m sorry if this is not the right pace for it. Lets say I used the 5point algorithm to get the relative movement between the first two frames and i have a third frame and want to know the continued movement of the camera i cant just use the algorithm again on the second and third frame or can I? Is there a simple solution for this or should i use solvePnP with the triangulated 3d points from the first 2 frames to get the camera pos in the third one?

I think your suggestion is the easiest I can think of. That is triangulate the 3D points from the first two and get the pos of the third one.