This is an OpenCV port of Robust Pose Estimation from a Planar Target (2006) by Gerald Schweighofer and Axel Pinz using their Matlab code from the link in the paper. It is used to determine the pose of a planar target. It can be used for Augmented Reality to track a planar target, such as a business card.

The algorithm is very robust and does not require accurate calibration of the camera for Augmented Reality applications. I found that a rough guess for the FOV of my webcam worked.

# Download

Last update: **12 March 2012**

RobustPlanarPose-1.1.2.zip

Changes from last version

- Added a check to see if the root finding code returned any valid solutions (thanks to csantos)

# Requirements

- OpenCV 2.x or later

# Usage

On Linux, type **make** to compile the demo code and **./demo **to see how it works. On Windows you have to make your own project.

**Note:** For Visual Studio users you might get a compiler error about undefined M_PI_2, please visit http://msdn.microsoft.com/en-us/library/4hwaceh6.aspx for a fix.

To use this code in your own program you only need to call the function

**RPP:Rpp(const cv::Mat &model_3D, const cv::Mat &iprts, cv::Mat &Rlu, cv::Mat &tlu, int &it1, double &obj_err1, double &img_err1);**

A description of the parameters are as follows:

**model_3D** are the fixed 3D points of the object you want to track, they should never change throughout the program. For example, say we have a square card we want to track, we can arbitrarily choose (-1,-1,0), (-1,1,0), (1,1,0), (1,-1,0) as the 4 corners point in 3D space. The values you choose will determine the reference frame for which the estimated pose will be relative to. Using the same example, (0,0,0) in this case will be at the centre of the square card.

**iprts** are the **normalised** image points. You will need to form a typical 3×3 camera matrix to normalise the points, which we shall call K. K typically looks like

fx/fy are the focal lengths (in pixels) and cx/cy are the optical centres. For Augmented Reality application, I find you can get away with rough approximations. For example, say the webcam has a horizontal FOV of about 60 degrees and the image size is 640×480. Then the 4 parameters are:

cx = 640/2 = 320

cy = 480/2 = 240

fx = fy = cx/tan(60/2 * pi / 180) = 554.26

Now that you have K, to normalise the points simply do

where x and y are the un-normalised image points

**
Rlu** is the returned 3×3 rotation matrix. If you pass in a valid 3×3 matrix the it will be used as the initial guess of the rotation. In a tracking scenario you would continuously feedback the returned Rlu as input for the next frame.

**tlu** is the returned 3×1 translation vector

**it1** is the number of iterations done

**obj_err1** is the object projection error in 3D

**img_err1** is the image projection error, in normalised image point space

# License

Simplified BSD License, unless superseded by a license from the original Matlab code or the copy and paste code for Rpoly.cpp.

hi

I got this result when i run program

Rotation matrix

0.833827 -0.329443 -0.442944

0.034358 -0.769870 0.637275

-0.550955 -0.546596 -0.630620

Translation matrix

-0.115157

1.284543

11.195500

Number of iterations: 29

Object error: 0.020854

Image error: 0.066171

Failure. No convergence after 20 shifts. Program terminated.

Input normalised image points (using 3×3 camera intrinsic matrix):

0: (-0.0168, 0.0377)

1: (0.0277, 0.0373)

2: (-0.0824, 0.0386)

3: (0.0317, 0.036)

4: (-0.1015, -0.008)

5: (0.0866, 0.1179)

6: (0.1233, 0.1035)

7: (0.0667, 0.1102)

8: (0.0969, 0.166)

9: (0.0622, 0.1608)

Input model points (fixed 3D points chosen by the user, z is fixed to 0.0):

0: (0.0685, 0.6383, 0.0)

1: (0.4558, 0.7411, 0.0)

2: (-0.7219, 0.7081, 0.0)

3: (0.7061, 0.2887, 0.0)

4: (-0.9521, -0.2553, 0.0)

5: (0.4636, 0.0159, 0.0)

6: (-0.101, 0.2817, 0.0)

7: (0.6638, 0.1582, 0.0)

8: (0.3925, -0.7954, 0.0)

9: (0.6965, -0.7795, 0.0)

Pose found by RPP

Rotation matrix

0.833820 -0.329449 -0.442954

0.034357 -0.769873 0.637271

-0.550967 -0.546588 -0.630617

Translation matrix

-0.115156

1.284540

11.195475

Number of iterations: 33

Object error: 0.020854

what is this msg is about

“Failure. No convergence after 20 shifts. Program terminated.”

and how to get same results as that in matlab version

That error is from a root finding algorithm that I got off the net. That error is probably dependent on the matrix backend that OpenCV uses. You got different numerical values to the ground truth. What OpenCV version are you using?

Thanks for the reply

i am using opencv 2.3.1

Same version as I’m using. Did you compile it yourself? By default it uses Eigen for its matrix operations.

I compiled as it is. Do i have to included libraries for Eigen. what are you results

Eigen is enabled by default in OpenCV. My output is:

Pose found by RPP

Rotation matrix

0.857628 -0.311788 0.408977

0.160469 -0.593308 -0.788819

0.488593 0.742141 -0.458806

Translation matrix

-0.108252

1.266009

11.198547

which is close to Matlab results. I don’t get any errors. My gcc version is:

gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3)

what compiler are you using?

i am using MSVC2010. Is this is the problem

I have not tried it in Visual Studio. Maybe the default floating model behaves in an unexpected way. Try setting it to “strict” (somewhere in the optimization setting, floating point model) and see if that changes anything.

Just updated the code. Try the new one out and see if it works for you.

Thank you for update. its works fine for me now with no errors.

i also got error when i complied first time is undefined symbol like M_PI_2 and i set _USE_MATH_DEFINES as preprocessor definition

I wasn’t aware of that define, that’s quite useful to know thanks.

i tried with changing setting to strict but it gave me same results. i think my results are different because of this error

Failure. No convergence after 20 shifts. Program terminated.

you know how to resolve this

Edit Rpoly.cpp, there are a few lines with hard coded 20 somewhere in it. Try increasing that.

i also tried that yesterday but didn’t work

Well I’m out of ideas. I got Visual Studio 2010 installed, so when I have time I’ll give it a try.

Thank you for your help

This is what I get using Visual Studio 2010, 64bit mode.

Pose found by RPP

Rotation matrix

0.857628 -0.311788 0.408977

0.160469 -0.593308 -0.788819

0.488593 0.742141 -0.458806

Translation matrix

-0.108252

1.266009

11.198547

Number of iterations: 35

Object error: 0.000471

Image error: 0.066120

Identical to what I got in Linux. I used a CLR visual studio project, with Common Language Runtime Support set to “No Common Language Runtime Support”, with your #define mentioned earlier for M_PI_2. And that’s it.

Also forgot to mention I’m using Visual Studio 2010 Express, with all the latest service pack and what not. This includes Visual Studio 2010 service pack 1, Windows 7 SDK (required for 64bit) and Microsoft Visual C++ 2010 Service pack.

The latter can be downloaded from http://www.microsoft.com/download/en/details.aspx?id=4422

Which type of CLR project and you tried with 64bit application.

CLR Empty Project

i got lots of trouble in running CLR project and specially when i changed Common Language Runtime Support to â€œNo Common Language Runtime Support”. Have you tried with standard win32 console application.

When pretty smooth for me. What problems did you get? I haven’t tried Win32 console.

Hey man, great job (assuming). I’ve been trying to run it on a VMWare’d instance of Ubuntu 11.04 with OpenCV 2.3.1 and gcc 4.5.2, but I always either get the convergence failure followed by a segfault or a glibc free() or fastbin memory error. Are any of these things you’ve run into or are they just problems on my end?

I haven’t ran into them personally. But it sounds like there might be legitimate issues. I’m planning on cleaning the code up a bit and running it through Valgrind or something similar to find any bugs.

Can you try the new version I just uploaded and see how it goes.

I got another convergence failure with a segfault.

Can you compile with the -g flag, edit Makefile. And run the program via gdb ./demo, type run, and see where it crashes.

Hi Nghia, excellent work. I have a question. I have implemented your code in combination with colour segmentation based point tracking. I am tracking a 100x100mm square, with the corners indicated by coloured dots, using a HD webcam. I have defined the (planar) object as follows:

Object model:

// Object model

// First row holds x-coordinates, second row holds corresponding y-coordinates

double modelPoints_data[8] = { -0.05, 0.05, -0.05, 0.05,

0.05, 0.05, -0.05, -0.05};

.. Later ..

// Insert data into RPP model input matrix

for(int i=0; i < 4; i++) {

modelPoints.at(0,i) = modelPoints_data[i]; // x

modelPoints.at(1,i) = modelPoints_data[i+4]; // y

}

E.g. similar to your demo code. You see the square center is the origin of the coordinate system, and the points as follows:

___________

| 1 2 |

| |

| |

|_3______4_|

With the program running, I output the modelPoints and imagePoints matrices (the latter holds the normalized image points, using a camera matrix equal to your example of 60 deg FoV, which seems to be accurate for this camera):

objectPoints:

-0.050000 0.050000 -0.050000 0.050000

0.050000 0.050000 -0.050000 -0.050000

0.000000 0.000000 0.000000 0.000000

imagePoints:

-0.049345 0.114326 -0.040100 0.123716

-0.042747 -0.052067 0.119452 0.112352

1.000000 1.000000 1.000000 1.000000

The object is in this case straight in front of the camera, and I manually select the points by mouse in the order as they are defined in the object model. Thus I would expect near-zero x and y translation, say 0.3m z-translation, and an near-identity rotation matrix.

The resulting translation vector seems to be very accurate according to initial measurements, however I am somewhat puzzled by the rotation matrix:

Rotation matrix:

0.994831 -0.056219 -0.084559

-0.054371 -0.998232 0.024007

-0.085760 -0.019286 -0.996129

This is not approximately an identity matrix as I expected, as two of the diagonals are -1. The values correspond to a rotation of +- pi radians about the x-axis of the object. If I select the points in “wrong” order, that corresponds to this rotation, I do obtain the expected identity matrix.

Do you have any thoughts on this?

Hmmm, is this a mouse Y co-ordinate inversion issue? You have positive Y as pointing “up”, but screen co-ordinate points “down”.

You are exactly right. Thank you. I solved this by changing

// Insert data into RPP image input matrix

for(int i=0; i < 4; i++) {

imagePoints.at(0,i) = pointPositions[i][0]; // x

imagePoints.at(1,i) = yMax-pointPositions[i][1]; // y

`// normalize image points`

imagePoints.col(i) = Kinv * imagePoints.col(i);

}

into

// Insert data into RPP image input matrix

for(int i=0; i < 4; i++) {

imagePoints.at(0,i) = pointPositions[i][0]; // x

imagePoints.at(1,i) = yMax-pointPositions[i][1]; // y; yMax-value displaces the origin to bottom-left instead of OpenCV's top-left

`// normalize image points`

imagePoints.col(i) = Kinv * imagePoints.col(i);

}

As indeed the OpenCV image origin is top-left and the camera origin is bottom-left. However one more question; after doing this the results are correct, but the iterations per pose estimation has gone up from 1 (for no rotation, i.e. camera directly in front of target), to the range of 200-700. As the no rotation case should be the easiest case to estimate, this is unexpected. What is the cause of this?

I suppose it could be mended by performing the estimation inverted and then doing a rotation afterwards, but of course this would not be feasible.

Try re-using your last rotation matrix found as the input into the RPP input. By default, if you give it an unitialised matrix, RPP will do a random initialisation to try and find the answer.

Thanks for the tip. Indeed I did do that before. That did not help in this particular case; I suspect the ambiguity of the target (square) was the cause of the problem. Currently I’m using an unambiguous target, which results in only 1-20 iterations usually.

Doctor Nghia,

Thank you very much for the source code and explanation.

I think you should post the source code to GitHub, it’s a great place to share source code (easy to view, update source code, easy for others to comment, contribute etc.):

https://github.com/

One of these days I will learn to use Github ðŸ™‚

Hello friend,

Under certain conditions, the error “No convergence after 20 shifts” seems to be inevitable. In such conditions, the program dies with a segfault error. My suggestion to correct this issue is by adding the following line:

`if(!bl.size()) return false;`

In the file RPP.cpp, line 882.

Thanks

Hi,

That sounds like a very good idea!

Could you explain the demo a bit more? I’m curious as to where you got the values for the error calculation as well as the meaning of the pairs of numbers (0,1),(0,2),(2,1) etc. used in the error calculation. Also, if there was a normalization function for the image points, that would be awesome.

Hi,

The demo uses the original values from the Matlab code I ported. The demo tries to find the 3D rotation/translation for transforming model_data -> iprts_data (3D -> 2D). The model_data is assumed to be a plane lying on z=0.

The error calculation is a simple comparison between the results from demo.cpp and the original Matlab values I got (ground truth). I’m just comparing each value in the matrix against the known answer.

I just realised the comment “Using fabs instead of sum square, in case the signs are opposite” does not match what I actually did ðŸ™‚ Should have deleted that comment.

I see, thanks.

Do you think you could sanity check my normalization function?

void normalizeImagePoints(double * pointVector, Mat & imagePoints, int pointCount,

int screenWidth, int screenHeight, int FOV )

{

/*normalizedPoints = (double *)malloc(sizeof(double) * pointCount);*/

double cx = screenWidth/2, cy = screenHeight/2;

double fx , fy;

fx = fy = cx / tan(FOV/2 * 3.14 / 180);

Mat kMat = Mat( 3, 3, CV_64F);

Mat kInv = Mat( 3, 3, CV_64F);

kMat.at(0,0) = fx;

kMat.at(1,1) = fy;

kMat.at(0,2) = cx;

kMat.at(1,2) = cy;

cv::invert(kMat,kInv);

for (int i = 0; i < pointCount; i++)

{

imagePoints.at(0,i) = pointVector[i * 2];

//pointVector[i]; // * 2];

imagePoints.at(1,i) = screenHeight – pointVector[(i*2)+1];

//screenHeight – pointVector[i+pointCount];//(i*2)+1];

imagePoints.col(i) = kInv * imagePoints.col(i);

}

}

Not getting the expected identity vector (function based on Wesley’s comment)… however my issue is that the 1’s column seems transposed (as well as with negatives…):

Rotation Matrix:

0.000897 -1.000000 -0.000041

-0.000166 -0.000041 1.000000

-1.000000 -0.000897 -0.000166

the version RobustPlanarPose-1.1.2.zip downloaded from above link,but there is no difference between 1.1.2 and version 1.1.1.

are you sure that you have upload the version 1.1.2?

The link was incorrect, it’s fixed now. There’s only a tiny change in RPP.cpp at line 883

// Suggestion by csantos

if(bl.empty()) {

return false;

}

hmm noticed some problems in that code that I pasted actually (uninitialized stack space, the one not being set in the K matrix, etc.), however after fixing it, my results are still fairly similar to what they were previously… :\

Your code looks about right. Check to make sure your imagePoints is initialised correctly so that the third tow is all 1.

Hi Nghiaho,

First of all, thanks for your great work! I’ve been looking for a robust pose estimation algorithm for a while now and yours works really well. I tried your demo and the results are identical to your output.

However, I’m experiencing some problem when my planar target is perpendicular to the camera and appears square on the image. For this specific case, the resulting pose shifts between each frame. I have drawn a line perpendicular to the planar target using the extrinsic parameters and this line keeps moving between each frame even if the coordinates of the corners are almost still. Do you know what may cause this behaviour?

I’m using OpenCV 2.1, Visual Studio 2008 Pro, Labview 2011 SP1 and a Thorlabs DCC1545M camera with a Theia SY110 corrected wide-angle lens

Thanks

Hi,

How many points are you using to estimate the pose? And are the points accurate?

Hi again,

Unfortunately, I can only use 4 points to estimate pose, but I made several functions to improve the corner detection and it is very accurate. However, this behaviour occurs more often when the target is farther from the camera thus having less accuracy on the position. I thought the precision I had was enough but it probably isn’t.

Thanks for your reply!

For my augmented reality stuff there was a significant improvement as the number of points used increased. It’s a shame you can’t use more than 4.

I have exactly the same problem, I’m using a square target and I was using only 4 points, I increased the number of points to 8 points; it is better but I still face the same problem, any clues?

Another problem is I get very weird results when I don’t provide an initial pose, which I get from basic homography computation.

The algorithm is probably very sensitive to noise. You want as much as points as possible. In my AR stuff I used as much point as possible for the homography calculation. Once I had a stable homography I used it to compute 4 corner points as input to the pose algorithm. I think I did this for speed reasons, because it’s cheaper to calculate the homography compared to the pose.

Are you using square targets, and if you are did you try putting the camera orthogonal to the square target?

I’ve tried with a rectangle business card and square CD case. Both work as long as there are enough points to match. Are your points normalised using the camera’s intrinsic matrix?

Yes the points are normalized, and I use the camera intrinsic matrix after doing the camera checkboard calibration. But I’m wondering what are the points you are using other than the 4 corner points?

P.S If I’m flooding you with comments, I’ll be glad if you contact me via email, and thanks for your support

I’m using points within the plane itself. For example, a CD cover has an artwork with lots of visual feature points that can be used. What is your object?

Replying via comments is fine, might help others in the process.

I’m using a marker bold square with some code in it. So let me ask are you measuring the points in 3D then project it, in other words how do you find the 2D-3D mapping knowing that the tracked object can differ?

I have an offline image of the CD cover stored in memory. The CD cover image was taken so it faces directly in front of the camera. I assign it some arbitrary 3D co-ordinates. When I do the initial 2D-2D match, I find where the 2D points land on the offline image and from this I can get a 2D-3D match.

Ummm, so its not a 100% online approach. I’m not sure but do you think there’s a better way?

The CD cover is tracked in real-time. What did you have in mind?

I meant in your approach you were using a previously captured image. I was thinking may be I can use a markerless tracking system along with a marker-based one and try to find the 3D mappings

Dear Nghia,

Great work! That helps me a lot for implementing an AR system.

Still, I am stuck at a rendering problem about how to set the OpenGL modelview matrix using the estimated rotation and translation matrix from RPP. It seems the signs of some values need to be changed to negative (in your code commented with “bit of guess work”), but I don’t know why and which exactly terms to change. Could you help to explain that ?

Thank you very much!

Best,

Arthur

I’ve got code in the MarkerlessARDemo that does exactly what you want. Download the code and look for this bit in ARPanel.cpp:

// Grab the translation/rotation information

m_t = g_AR.GetTranslation();

// NOTE the negative -z, OpenGL postive z is out of the sceen(0,0), m_t.at (1,0), -m_t.at (2,0));

glTranslatef(m_t.at

// Apply camera rotation to the model

m_rot = g_AR.GetRotation();

m_rot = g_AR.CorrectRotationForOpenGL(m_rot);

MultRotationMatrix(m_ro );

// Draw our teapot model

What are the required changes if I want to use the image points in pixels not the normalized ones?

Another question, I don’t see in your code anything about the intrinsic parameters of the camera, or am I missing something?

I don’t think that algorithm can work without normalized points. I’ve found you don’t need an accurate intrinsic matrix to get usable results. When I used it on my webcam I just took a guess and never bothered to confirm it via checkboard calibration. The intrinsic parameter is the K matrix mentioned on the page.

Hi Thanks for the great work.

I came across this algorithm and want to use it to replace the use of opencv’s builtin solvePnP to calculate the camera’s rotational and translation matrices based on corners of detected markers. Then I’m using the two matrices to project some 3d points back to the image plane. It worked fine with using solvePnP but I get some jittering effects and that’s why I wanted to try this out.

The problem when I started using this algorithm is that I get really odd solutions:

data for Rpp

[0, 0, 1, 1;

0, 1, 1, 0;

0, 0, 0, 0]

[141.5075073242188, 148.9447631835938, 265.8343505859375, 272.9588012695313;

379.0839538574219, 266.201904296875, 264.2275085449219, 371.3097839355469;

1, 1, 1, 1]

(vector of point3f)

data for solvepnp

[0, 0, 0; 0, 1, 0; 1, 1, 0; 1, 0, 0]

(vector of point2f)

[141.50751, 379.08395; 148.94476, 266.2019; 265.83435, 264.22751; 272.9588, 371.

30978]

result from Rpp

Rotation matrix

[0.9986703642011102, -0.05155098095899455, -5.346740956264395e-006;

-0.05155097733117901, -0.9986703235488186, 0.0002856555596111147;

-2.006545583506777e-005, -0.0002850001120310718, -0.999999959186156]

Translation matrix

[1.211351025315951; 3.136464760403361; 0.008283444467592525]

result from solvePnP

Rotation matrix

[2.56306007280152; -0.0845208981807476; 0.2373663176224296]

[-1.363359203478092; 1.056087155784301; 4.236560203317223]

Translation matrix

[4.61047e+019, 1.18111e+020]

I’m wondering if it’s because of using this algorithm incorrectly?

Sorry, I didn’t convert the result of raux back to 3×3 matrices. I redid the calculation and is shown below.

data for Rpp

[0, 0, 1, 1;

0, 1, 1, 0;

0, 0, 0, 0]

[218, 237.3515777587891, 420.9180297851563, 447.0541687011719;

422, 245.8782348632813, 244.9852905273438, 426.8499755859375;

1, 1, 1, 1]

data for solvepnp

[0, 0, 0; 0, 1, 0; 1, 1, 0; 1, 0, 0]

[218, 422; 237.35158, 245.87823; 420.91803, 244.98529; 447.05417, 426.84998]

result from Rpp

[0.07548107378172445, 0.9971429198010968, -0.002933426546021601;

-0.9971421371442634, 0.07547095480045923, -0.0034195486365947;

-0.003188390209630636, 0.003183154418194693, 0.9999898507964076]

[1.154120170651896; 2.191182270703893; 0.005600938552510155]

result from solvePnP

[-0.79416007, -0.59182167, -0.1380467;

0.39268467, -0.67311519, 0.62666953;

-0.46379793, 0.44346708, 0.7669605]

[-0.45007941; 0.80028373; 2.418669]

Okay, never mind I realized the problem was not normalising the image points. I fixed the problem by normalizing the points using undistortPoints with camera matrices.

I ran into another problem with RPP. It works fine the first time I call the function, but when I call the function again with a previous Rotational matrix, I get the following errorr

OpenCV Error: Assertion failed (a_size.width == len) in unknown function, file .

.\..\..\src\opencv\modules\core\src\matmul.cpp, line 718

sorry for spamming this. But i realized the problem was I converted the rotation matrix to 3X1 and passed it back to the function in the second loop.

what is this “raux” you mentioned in the reply? Is it the rotation result rvec from the solvePNP?

Just as a final update, it turns out that the error between using RPP and solvePNP is less than 1e-5 for both the rotational and translation matrices

Hey there,

Seems like you got everything under control! Thanks for the doing a comparison between RPP and solvePNP, nice to know.

Hi

Happy new year and thanks for the porting to C++, it’s quite clear how to use it.

I compiled it easily with visual studio 2013, but now I would like to use it in an Android app, and I need to compile it with the Android NDK, have you ever tried that? I get a lot of errors when doing so…

Well, I figured what was the problem. in RPP.h ObjPose is declared as

void ObjPose(const cv::Mat P…….

and in RPP.cpp as

void ObjPose(const cv::Mat _P,……

The compiler did not like the switch between _P and P.

Ah, that’s useful to know!

hello, I have a doubt about the normalised image points using camera matricx . ’cause I can’t see the picture for it. Could you send me the picture in this page.Thanks.

lg.victor@163.com

what is the nomalised points???

It’s just inv(K) * [x y 1], where K is the 3×3 camera matrix.

thank you

Hi, thanks very much for the code.

Originally I used cv::solvePNP for a planar pattern pose estimation. However, I found that in some cases (where the pattern face frontal parallel to the camera), the function returns unstable results from time to time. I presume there is an ambiguous case to be removed, so I applied Schweighofer’s paper for solving the issue. Essentially I applied your RPP code by using an initial guess calculated from the solvePNP. It still gives me similar unstable result, so the RPP is not able to remove the ambiguous case.

Do you have any suggestion for this?

Thanks very much!

Hmm, I don’t have any solution for this. Maybe other people here do.

Maybe the unstable solvePNP results give a bad initial value to to RPP thus confuse the latter algorithm. Please try to use the history value of RPP to do initialization.