Pose Estimation For Planar Target

This is an OpenCV port of Robust Pose Estimation from a Planar Target (2006) by Gerald Schweighofer and Axel Pinz using their Matlab code from the link in the paper. It is used to determine the pose of a planar target. It can be used for Augmented Reality to track a planar target, such as a business card.

The algorithm is very robust and does not require accurate calibration of the camera for Augmented Reality applications. I found that a rough guess for the FOV of my webcam worked.

Download

Last update: 12 March 2012
RobustPlanarPose-1.1.2.zip

Changes from last version

Added a check to see if the root finding code returned any valid solutions (thanks to csantos)

Requirements

OpenCV 2.x or later

Usage

On Linux, type make to compile the demo code and ./demo to see how it works. On Windows you have to make your own project.

Note: For Visual Studio users you might get a compiler error about undefined M_PI_2, please visit http://msdn.microsoft.com/en-us/library/4hwaceh6.aspx for a fix.

To use this code in your own program you only need to call the function

RPP:Rpp(const cv::Mat &model_3D, const cv::Mat &iprts, cv::Mat &Rlu, cv::Mat &tlu, int &it1, double &obj_err1, double &img_err1);

A description of the parameters are as follows:

model_3D are the fixed 3D points of the object you want to track, they should never change throughout the program. For example, say we have a square card we want to track, we can arbitrarily choose (-1,-1,0), (-1,1,0), (1,1,0), (1,-1,0) as the 4 corners point in 3D space. The values you choose will determine the reference frame for which the estimated pose will be relative to. Using the same example, (0,0,0) in this case will be at the centre of the square card.

iprts are the normalised image points. You will need to form a typical 3×3 camera matrix to normalise the points, which we shall call K. K typically looks like

$K = \left[\begin{array}{ccc}fx & 0 & cx \\ 0 & fy & cy \\ 0 & 0 & 1 \end{array}\right]$

fx/fy are the focal lengths (in pixels) and cx/cy are the optical centres. For Augmented Reality application, I find you can get away with rough approximations. For example, say the webcam has a horizontal FOV of about 60 degrees and the image size is 640×480. Then the 4 parameters are:

cx = 640/2 = 320
cy = 480/2 = 240
fx = fy = cx/tan(60/2 * pi / 180) = 554.26

Now that you have K, to normalise the points simply do

$\left[\begin{array}{c} x^{\prime} \\ y^{\prime} \\ 1 \end{array}\right] = K^{-1} \left[\begin{array}{c} x \\ y \\ 1 \end{array}\right]$

where x and y are the un-normalised image points

Rlu is the returned 3×3 rotation matrix. If you pass in a valid 3×3 matrix the it will be used as the initial guess of the rotation. In a tracking scenario you would continuously feedback the returned Rlu as input for the next frame.

tlu is the returned 3×1 translation vector

it1 is the number of iterations done

obj_err1 is the object projection error in 3D

img_err1 is the image projection error, in normalised image point space

License

Simplified BSD License, unless superseded by a license from the original Matlab code or the copy and paste code for Rpoly.cpp.

91 thoughts on “Pose Estimation For Planar Target”

Rizwan Macknojia says:

December 1, 2011 at 2:23 pm

hi

I got this result when i run program

Rotation matrix
0.833827 -0.329443 -0.442944
0.034358 -0.769870 0.637275
-0.550955 -0.546596 -0.630620

Translation matrix
-0.115157
1.284543
11.195500

Number of iterations: 29
Object error: 0.020854
Image error: 0.066171

Failure. No convergence after 20 shifts. Program terminated.
Input normalised image points (using 3×3 camera intrinsic matrix):
0: (-0.0168, 0.0377)
1: (0.0277, 0.0373)
2: (-0.0824, 0.0386)
3: (0.0317, 0.036)
4: (-0.1015, -0.008)
5: (0.0866, 0.1179)
6: (0.1233, 0.1035)
7: (0.0667, 0.1102)
8: (0.0969, 0.166)
9: (0.0622, 0.1608)

Input model points (fixed 3D points chosen by the user, z is fixed to 0.0):
0: (0.0685, 0.6383, 0.0)
1: (0.4558, 0.7411, 0.0)
2: (-0.7219, 0.7081, 0.0)
3: (0.7061, 0.2887, 0.0)
4: (-0.9521, -0.2553, 0.0)
5: (0.4636, 0.0159, 0.0)
6: (-0.101, 0.2817, 0.0)
7: (0.6638, 0.1582, 0.0)
8: (0.3925, -0.7954, 0.0)
9: (0.6965, -0.7795, 0.0)

Pose found by RPP

Rotation matrix
0.833820 -0.329449 -0.442954
0.034357 -0.769873 0.637271
-0.550967 -0.546588 -0.630617

Translation matrix
-0.115156
1.284540
11.195475

Number of iterations: 33
Object error: 0.020854

what is this msg is about
“Failure. No convergence after 20 shifts. Program terminated.”

and how to get same results as that in matlab version

Reply
1. nghiaho12 says:
  
  December 2, 2011 at 6:34 am
  
  That error is from a root finding algorithm that I got off the net. That error is probably dependent on the matrix backend that OpenCV uses. You got different numerical values to the ground truth. What OpenCV version are you using?
  
  Reply
Rizwan Macknojia says:

December 2, 2011 at 3:13 pm

Thanks for the reply
i am using opencv 2.3.1

Reply
1. nghiaho12 says:
  
  December 2, 2011 at 5:51 pm
  
  Same version as I’m using. Did you compile it yourself? By default it uses Eigen for its matrix operations.
  
  Reply
  1. Rizwan Macknojia says:
    
    December 3, 2011 at 9:30 pm
    
    I compiled as it is. Do i have to included libraries for Eigen. what are you results
    
    Reply
    1. nghiaho12 says:
      
      December 3, 2011 at 10:34 pm
      
      Eigen is enabled by default in OpenCV. My output is:
      
      Pose found by RPP
      
      Rotation matrix
      0.857628 -0.311788 0.408977
      0.160469 -0.593308 -0.788819
      0.488593 0.742141 -0.458806
      
      Translation matrix
      -0.108252
      1.266009
      11.198547
      
      which is close to Matlab results. I don’t get any errors. My gcc version is:
      
      gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3)
      
      what compiler are you using?
      
      Reply
      1. Rizwan Macknojia says:
        
        December 3, 2011 at 11:05 pm
        
        i am using MSVC2010. Is this is the problem
    2. nghiaho12 says:
      
      December 3, 2011 at 11:16 pm
      
      I have not tried it in Visual Studio. Maybe the default floating model behaves in an unexpected way. Try setting it to “strict” (somewhere in the optimization setting, floating point model) and see if that changes anything.
      
      Reply
2. nghiaho12 says:
  
  December 7, 2011 at 3:35 am
  
  Just updated the code. Try the new one out and see if it works for you.
  
  Reply
  1. Rizwan Macknojia says:
    
    December 8, 2011 at 11:47 am
    
    Thank you for update. its works fine for me now with no errors.
    
    Reply
Rizwan Macknojia says:

December 3, 2011 at 11:28 pm

i also got error when i complied first time is undefined symbol like M_PI_2 and i set _USE_MATH_DEFINES as preprocessor definition

Reply
1. nghiaho12 says:
  
  December 3, 2011 at 11:31 pm
  
  I wasn’t aware of that define, that’s quite useful to know thanks.
  
  Reply
  1. Rizwan Macknojia says:
    
    December 3, 2011 at 11:35 pm
    
    i tried with changing setting to strict but it gave me same results. i think my results are different because of this error
    Failure. No convergence after 20 shifts. Program terminated.
    you know how to resolve this
    
    Reply
    1. nghiaho12 says:
      
      December 3, 2011 at 11:45 pm
      
      Edit Rpoly.cpp, there are a few lines with hard coded 20 somewhere in it. Try increasing that.
      
      Reply
      1. Rizwan Macknojia says:
        
        December 3, 2011 at 11:51 pm
        
        i also tried that yesterday but didn’t work
    2. nghiaho12 says:
      
      December 4, 2011 at 12:17 am
      
      Well I’m out of ideas. I got Visual Studio 2010 installed, so when I have time I’ll give it a try.
      
      Reply
      1. Rizwan Macknojia says:
        
        December 4, 2011 at 12:26 am
        
        Thank you for your help
    3. nghiaho12 says:
      
      December 4, 2011 at 8:09 am
      
      This is what I get using Visual Studio 2010, 64bit mode.
      
      Pose found by RPP
      
      Rotation matrix
      0.857628 -0.311788 0.408977
      0.160469 -0.593308 -0.788819
      0.488593 0.742141 -0.458806
      
      Translation matrix
      -0.108252
      1.266009
      11.198547
      
      Number of iterations: 35
      Object error: 0.000471
      Image error: 0.066120
      
      Identical to what I got in Linux. I used a CLR visual studio project, with Common Language Runtime Support set to “No Common Language Runtime Support”, with your #define mentioned earlier for M_PI_2. And that’s it.
      
      Reply
    4. nghiaho12 says:
      
      December 4, 2011 at 8:12 am
      
      Also forgot to mention I’m using Visual Studio 2010 Express, with all the latest service pack and what not. This includes Visual Studio 2010 service pack 1, Windows 7 SDK (required for 64bit) and Microsoft Visual C++ 2010 Service pack.
      
      The latter can be downloaded from http://www.microsoft.com/download/en/details.aspx?id=4422
      
      Reply
      1. Rizwan Macknojia says:
        
        December 4, 2011 at 8:37 pm
        
        Which type of CLR project and you tried with 64bit application.
    5. nghiaho12 says:
      
      December 5, 2011 at 3:18 am
      
      CLR Empty Project
      
      Reply
      1. Rizwan Macknojia says:
        
        December 5, 2011 at 12:26 pm
        
        i got lots of trouble in running CLR project and specially when i changed Common Language Runtime Support to “No Common Language Runtime Support”. Have you tried with standard win32 console application.
    6. nghiaho12 says:
      
      December 6, 2011 at 6:52 am
      
      When pretty smooth for me. What problems did you get? I haven’t tried Win32 console.
      
      Reply
Tim says:

December 6, 2011 at 5:46 pm

Hey man, great job (assuming). I’ve been trying to run it on a VMWare’d instance of Ubuntu 11.04 with OpenCV 2.3.1 and gcc 4.5.2, but I always either get the convergence failure followed by a segfault or a glibc free() or fastbin memory error. Are any of these things you’ve run into or are they just problems on my end?

Reply
1. nghiaho12 says:
  
  December 7, 2011 at 2:55 am
  
  I haven’t ran into them personally. But it sounds like there might be legitimate issues. I’m planning on cleaning the code up a bit and running it through Valgrind or something similar to find any bugs.
  
  Reply
2. nghiaho12 says:
  
  December 7, 2011 at 3:34 am
  
  Can you try the new version I just uploaded and see how it goes.
  
  Reply
  1. Tim says:
    
    December 7, 2011 at 11:19 am
    
    I got another convergence failure with a segfault.
    
    Reply
    1. nghiaho12 says:
      
      December 7, 2011 at 5:37 pm
      
      Can you compile with the -g flag, edit Makefile. And run the program via gdb ./demo, type run, and see where it crashes.
      
      Reply
Wesley says:

December 10, 2011 at 2:34 am

Hi Nghia, excellent work. I have a question. I have implemented your code in combination with colour segmentation based point tracking. I am tracking a 100x100mm square, with the corners indicated by coloured dots, using a HD webcam. I have defined the (planar) object as follows:

Object model:
// Object model
// First row holds x-coordinates, second row holds corresponding y-coordinates
double modelPoints_data[8] = { -0.05, 0.05, -0.05, 0.05,
0.05, 0.05, -0.05, -0.05};

.. Later ..

// Insert data into RPP model input matrix
for(int i=0; i < 4; i++) {
modelPoints.at(0,i) = modelPoints_data[i]; // x
modelPoints.at(1,i) = modelPoints_data[i+4]; // y
}

E.g. similar to your demo code. You see the square center is the origin of the coordinate system, and the points as follows:

___________
| 1 2 |
| |
| |
|_3______4_|

With the program running, I output the modelPoints and imagePoints matrices (the latter holds the normalized image points, using a camera matrix equal to your example of 60 deg FoV, which seems to be accurate for this camera):

objectPoints:
-0.050000 0.050000 -0.050000 0.050000
0.050000 0.050000 -0.050000 -0.050000
0.000000 0.000000 0.000000 0.000000

imagePoints:
-0.049345 0.114326 -0.040100 0.123716
-0.042747 -0.052067 0.119452 0.112352
1.000000 1.000000 1.000000 1.000000

The object is in this case straight in front of the camera, and I manually select the points by mouse in the order as they are defined in the object model. Thus I would expect near-zero x and y translation, say 0.3m z-translation, and an near-identity rotation matrix.

The resulting translation vector seems to be very accurate according to initial measurements, however I am somewhat puzzled by the rotation matrix:

Rotation matrix:
0.994831 -0.056219 -0.084559
-0.054371 -0.998232 0.024007
-0.085760 -0.019286 -0.996129

This is not approximately an identity matrix as I expected, as two of the diagonals are -1. The values correspond to a rotation of +- pi radians about the x-axis of the object. If I select the points in “wrong” order, that corresponds to this rotation, I do obtain the expected identity matrix.

Do you have any thoughts on this?

Reply
1. nghiaho12 says:
  
  December 10, 2011 at 4:11 am
  
  Hmmm, is this a mouse Y co-ordinate inversion issue? You have positive Y as pointing “up”, but screen co-ordinate points “down”.
  
  Reply
  1. Wesley says:
    
    December 10, 2011 at 8:43 pm
    
    You are exactly right. Thank you. I solved this by changing
    
    // Insert data into RPP image input matrix for(int i=0; i < 4; i++) { imagePoints.at(0,i) = pointPositions[i][0]; // x imagePoints.at(1,i) = yMax-pointPositions[i][1]; // y
    // normalize image points imagePoints.col(i) = Kinv * imagePoints.col(i); }
    
    into
    
    // Insert data into RPP image input matrix for(int i=0; i < 4; i++) { imagePoints.at(0,i) = pointPositions[i][0]; // x imagePoints.at(1,i) = yMax-pointPositions[i][1]; // y; yMax-value displaces the origin to bottom-left instead of OpenCV's top-left
    // normalize image points imagePoints.col(i) = Kinv * imagePoints.col(i); }
    
    As indeed the OpenCV image origin is top-left and the camera origin is bottom-left. However one more question; after doing this the results are correct, but the iterations per pose estimation has gone up from 1 (for no rotation, i.e. camera directly in front of target), to the range of 200-700. As the no rotation case should be the easiest case to estimate, this is unexpected. What is the cause of this?
    
    I suppose it could be mended by performing the estimation inverted and then doing a rotation afterwards, but of course this would not be feasible.
    
    Reply
    1. nghiaho12 says:
      
      December 10, 2011 at 9:00 pm
      
      Try re-using your last rotation matrix found as the input into the RPP input. By default, if you give it an unitialised matrix, RPP will do a random initialisation to try and find the answer.
      
      Reply
      1. Wesley says:
        
        December 27, 2011 at 11:16 pm
        
        Thanks for the tip. Indeed I did do that before. That did not help in this particular case; I suspect the ambiguity of the target (square) was the cause of the problem. Currently I’m using an unambiguous target, which results in only 1-20 iterations usually.
Ngoc says:

February 21, 2012 at 5:52 am

Doctor Nghia,
Thank you very much for the source code and explanation.

I think you should post the source code to GitHub, it’s a great place to share source code (easy to view, update source code, easy for others to comment, contribute etc.):
https://github.com/

Reply
1. nghiaho12 says:
  
  February 22, 2012 at 6:34 am
  
  One of these days I will learn to use Github 🙂
  
  Reply
csantos says:

March 11, 2012 at 2:38 pm

Hello friend,
Under certain conditions, the error “No convergence after 20 shifts” seems to be inevitable. In such conditions, the program dies with a segfault error. My suggestion to correct this issue is by adding the following line:
if(!bl.size()) return false;
In the file RPP.cpp, line 882.

Thanks

Reply
1. nghiaho12 says:
  
  March 11, 2012 at 6:10 pm
  
  Hi,
  
  That sounds like a very good idea!
  
  Reply
a5t says:

April 15, 2012 at 8:28 pm

Could you explain the demo a bit more? I’m curious as to where you got the values for the error calculation as well as the meaning of the pairs of numbers (0,1),(0,2),(2,1) etc. used in the error calculation. Also, if there was a normalization function for the image points, that would be awesome.

Reply
1. nghiaho12 says:
  
  April 16, 2012 at 8:14 am
  
  Hi,
  
  The demo uses the original values from the Matlab code I ported. The demo tries to find the 3D rotation/translation for transforming model_data -> iprts_data (3D -> 2D). The model_data is assumed to be a plane lying on z=0.
  
  The error calculation is a simple comparison between the results from demo.cpp and the original Matlab values I got (ground truth). I’m just comparing each value in the matrix against the known answer.
  
  I just realised the comment “Using fabs instead of sum square, in case the signs are opposite” does not match what I actually did 🙂 Should have deleted that comment.
  
  Reply
  1. a5t says:
    
    April 17, 2012 at 9:07 pm
    
    I see, thanks.
    
    Do you think you could sanity check my normalization function?
    
    void normalizeImagePoints(double * pointVector, Mat & imagePoints, int pointCount,
    int screenWidth, int screenHeight, int FOV )
    {
    /*normalizedPoints = (double *)malloc(sizeof(double) * pointCount);*/
    double cx = screenWidth/2, cy = screenHeight/2;
    double fx , fy;
    fx = fy = cx / tan(FOV/2 * 3.14 / 180);
    
    Mat kMat = Mat( 3, 3, CV_64F);
    Mat kInv = Mat( 3, 3, CV_64F);
    kMat.at(0,0) = fx;
    kMat.at(1,1) = fy;
    kMat.at(0,2) = cx;
    kMat.at(1,2) = cy;
    
    cv::invert(kMat,kInv);
    
    for (int i = 0; i < pointCount; i++)
    {
    imagePoints.at(0,i) = pointVector[i * 2];
    //pointVector[i]; // * 2];
    imagePoints.at(1,i) = screenHeight – pointVector[(i*2)+1];
    //screenHeight – pointVector[i+pointCount];//(i*2)+1];
    
    imagePoints.col(i) = kInv * imagePoints.col(i);
    }
    }
    
    Not getting the expected identity vector (function based on Wesley’s comment)… however my issue is that the 1’s column seems transposed (as well as with negatives…):
    Rotation Matrix:
    0.000897 -1.000000 -0.000041
    -0.000166 -0.000041 1.000000
    -1.000000 -0.000897 -0.000166
    
    Reply
miaoxikui says:

April 16, 2012 at 12:31 am

the version RobustPlanarPose-1.1.2.zip downloaded from above link,but there is no difference between 1.1.2 and version 1.1.1.
are you sure that you have upload the version 1.1.2?

Reply
1. nghiaho12 says:
  
  April 16, 2012 at 8:07 am
  
  The link was incorrect, it’s fixed now. There’s only a tiny change in RPP.cpp at line 883
  
  // Suggestion by csantos
  if(bl.empty()) {
  return false;
  }
  
  Reply
a5t says:

April 18, 2012 at 1:47 am

hmm noticed some problems in that code that I pasted actually (uninitialized stack space, the one not being set in the K matrix, etc.), however after fixing it, my results are still fairly similar to what they were previously… :\

Reply
1. nghiaho12 says:
  
  April 18, 2012 at 4:29 am
  
  Your code looks about right. Check to make sure your imagePoints is initialised correctly so that the third tow is all 1.
  
  Reply
labou333 says:

July 16, 2012 at 1:32 pm

Hi Nghiaho,

First of all, thanks for your great work! I’ve been looking for a robust pose estimation algorithm for a while now and yours works really well. I tried your demo and the results are identical to your output.

However, I’m experiencing some problem when my planar target is perpendicular to the camera and appears square on the image. For this specific case, the resulting pose shifts between each frame. I have drawn a line perpendicular to the planar target using the extrinsic parameters and this line keeps moving between each frame even if the coordinates of the corners are almost still. Do you know what may cause this behaviour?

I’m using OpenCV 2.1, Visual Studio 2008 Pro, Labview 2011 SP1 and a Thorlabs DCC1545M camera with a Theia SY110 corrected wide-angle lens

Thanks

Reply
1. nghiaho12 says:
  
  July 17, 2012 at 7:12 am
  
  Hi,
  
  How many points are you using to estimate the pose? And are the points accurate?
  
  Reply
  1. labou333 says:
    
    July 17, 2012 at 7:49 am
    
    Hi again,
    
    Unfortunately, I can only use 4 points to estimate pose, but I made several functions to improve the corner detection and it is very accurate. However, this behaviour occurs more often when the target is farther from the camera thus having less accuracy on the position. I thought the precision I had was enough but it probably isn’t.
    
    Thanks for your reply!
    
    Reply
    1. nghiaho12 says:
      
      July 17, 2012 at 7:59 am
      
      For my augmented reality stuff there was a significant improvement as the number of points used increased. It’s a shame you can’t use more than 4.
      
      Reply
      1. Mahmoud says:
        
        April 22, 2013 at 4:12 am
        
        I have exactly the same problem, I’m using a square target and I was using only 4 points, I increased the number of points to 8 points; it is better but I still face the same problem, any clues?
        Another problem is I get very weird results when I don’t provide an initial pose, which I get from basic homography computation.
      2. nghiaho12 says:
        
        April 22, 2013 at 4:22 am
        
        The algorithm is probably very sensitive to noise. You want as much as points as possible. In my AR stuff I used as much point as possible for the homography calculation. Once I had a stable homography I used it to compute 4 corner points as input to the pose algorithm. I think I did this for speed reasons, because it’s cheaper to calculate the homography compared to the pose.
      3. Mahmoud says:
        
        April 22, 2013 at 4:28 am
        
        Are you using square targets, and if you are did you try putting the camera orthogonal to the square target?
      4. nghiaho12 says:
        
        April 22, 2013 at 4:34 am
        
        I’ve tried with a rectangle business card and square CD case. Both work as long as there are enough points to match. Are your points normalised using the camera’s intrinsic matrix?
      5. Mahmoud says:
        
        April 22, 2013 at 4:43 am
        
        Yes the points are normalized, and I use the camera intrinsic matrix after doing the camera checkboard calibration. But I’m wondering what are the points you are using other than the 4 corner points?
        
        P.S If I’m flooding you with comments, I’ll be glad if you contact me via email, and thanks for your support
      6. nghiaho12 says:
        
        April 22, 2013 at 4:57 am
        
        I’m using points within the plane itself. For example, a CD cover has an artwork with lots of visual feature points that can be used. What is your object?
        
        Replying via comments is fine, might help others in the process.
      7. Mahmoud says:
        
        April 22, 2013 at 7:16 am
        
        I’m using a marker bold square with some code in it. So let me ask are you measuring the points in 3D then project it, in other words how do you find the 2D-3D mapping knowing that the tracked object can differ?
      8. nghiaho12 says:
        
        April 22, 2013 at 7:24 am
        
        I have an offline image of the CD cover stored in memory. The CD cover image was taken so it faces directly in front of the camera. I assign it some arbitrary 3D co-ordinates. When I do the initial 2D-2D match, I find where the 2D points land on the offline image and from this I can get a 2D-3D match.
      9. Mahmoud says:
        
        April 22, 2013 at 7:32 am
        
        Ummm, so its not a 100% online approach. I’m not sure but do you think there’s a better way?
      10. nghiaho12 says:
        
        April 22, 2013 at 7:42 am
        
        The CD cover is tracked in real-time. What did you have in mind?
      11. Mahmoud says:
        
        April 22, 2013 at 7:51 am
        
        I meant in your approach you were using a previously captured image. I was thinking may be I can use a markerless tracking system along with a marker-based one and try to find the 3D mappings
arthur1026 says:

August 6, 2012 at 10:27 pm

Dear Nghia,

Great work! That helps me a lot for implementing an AR system.

Still, I am stuck at a rendering problem about how to set the OpenGL modelview matrix using the estimated rotation and translation matrix from RPP. It seems the signs of some values need to be changed to negative (in your code commented with “bit of guess work”), but I don’t know why and which exactly terms to change. Could you help to explain that ?

Thank you very much!

Best,
Arthur

Reply
1. nghiaho12 says:
  
  August 7, 2012 at 6:11 am
  
  I’ve got code in the MarkerlessARDemo that does exactly what you want. Download the code and look for this bit in ARPanel.cpp:
  
  // Grab the translation/rotation information
  m_t = g_AR.GetTranslation();
  
  // NOTE the negative -z, OpenGL postive z is out of the sceen
  glTranslatef(m_t.at(0,0), m_t.at(1,0), -m_t.at(2,0));
  
  // Apply camera rotation to the model
  m_rot = g_AR.GetRotation();
  m_rot = g_AR.CorrectRotationForOpenGL(m_rot);
  MultRotationMatrix(m_ro );
  
  // Draw our teapot model
  
  Reply
Mahmoud Bahaa says:

March 19, 2013 at 12:05 am

What are the required changes if I want to use the image points in pixels not the normalized ones?
Another question, I don’t see in your code anything about the intrinsic parameters of the camera, or am I missing something?

Reply
1. nghiaho12 says:
  
  March 19, 2013 at 5:18 am
  
  I don’t think that algorithm can work without normalized points. I’ve found you don’t need an accurate intrinsic matrix to get usable results. When I used it on my webcam I just took a guess and never bothered to confirm it via checkboard calibration. The intrinsic parameter is the K matrix mentioned on the page.
  
  Reply
Sam Chuang says:

June 3, 2013 at 1:05 am

Hi Thanks for the great work.
I came across this algorithm and want to use it to replace the use of opencv’s builtin solvePnP to calculate the camera’s rotational and translation matrices based on corners of detected markers. Then I’m using the two matrices to project some 3d points back to the image plane. It worked fine with using solvePnP but I get some jittering effects and that’s why I wanted to try this out.

The problem when I started using this algorithm is that I get really odd solutions:
data for Rpp
[0, 0, 1, 1;
0, 1, 1, 0;
0, 0, 0, 0]
[141.5075073242188, 148.9447631835938, 265.8343505859375, 272.9588012695313;
379.0839538574219, 266.201904296875, 264.2275085449219, 371.3097839355469;
1, 1, 1, 1]

(vector of point3f)
data for solvepnp
[0, 0, 0; 0, 1, 0; 1, 1, 0; 1, 0, 0]

(vector of point2f)
[141.50751, 379.08395; 148.94476, 266.2019; 265.83435, 264.22751; 272.9588, 371.
30978]

result from Rpp
Rotation matrix
[0.9986703642011102, -0.05155098095899455, -5.346740956264395e-006;
-0.05155097733117901, -0.9986703235488186, 0.0002856555596111147;
-2.006545583506777e-005, -0.0002850001120310718, -0.999999959186156]

Translation matrix
[1.211351025315951; 3.136464760403361; 0.008283444467592525]

result from solvePnP
Rotation matrix
[2.56306007280152; -0.0845208981807476; 0.2373663176224296]
[-1.363359203478092; 1.056087155784301; 4.236560203317223]

Translation matrix
[4.61047e+019, 1.18111e+020]

I’m wondering if it’s because of using this algorithm incorrectly?

Reply
1. Sam Chuang says:
  
  June 3, 2013 at 1:14 am
  
  Sorry, I didn’t convert the result of raux back to 3×3 matrices. I redid the calculation and is shown below.
  
  data for Rpp
  [0, 0, 1, 1;
  0, 1, 1, 0;
  0, 0, 0, 0]
  
  [218, 237.3515777587891, 420.9180297851563, 447.0541687011719;
  422, 245.8782348632813, 244.9852905273438, 426.8499755859375;
  1, 1, 1, 1]
  
  data for solvepnp
  [0, 0, 0; 0, 1, 0; 1, 1, 0; 1, 0, 0]
  
  [218, 422; 237.35158, 245.87823; 420.91803, 244.98529; 447.05417, 426.84998]
  
  result from Rpp
  [0.07548107378172445, 0.9971429198010968, -0.002933426546021601;
  -0.9971421371442634, 0.07547095480045923, -0.0034195486365947;
  -0.003188390209630636, 0.003183154418194693, 0.9999898507964076]
  
  [1.154120170651896; 2.191182270703893; 0.005600938552510155]
  
  result from solvePnP
  
  [-0.79416007, -0.59182167, -0.1380467;
  0.39268467, -0.67311519, 0.62666953;
  -0.46379793, 0.44346708, 0.7669605]
  
  [-0.45007941; 0.80028373; 2.418669]
  
  Reply
  1. Sam Chuang says:
    
    June 3, 2013 at 5:48 am
    
    Okay, never mind I realized the problem was not normalising the image points. I fixed the problem by normalizing the points using undistortPoints with camera matrices.
    
    I ran into another problem with RPP. It works fine the first time I call the function, but when I call the function again with a previous Rotational matrix, I get the following errorr
    
    OpenCV Error: Assertion failed (a_size.width == len) in unknown function, file .
    .\..\..\src\opencv\modules\core\src\matmul.cpp, line 718
    
    Reply
  2. Sam Chuang says:
    
    June 3, 2013 at 6:24 am
    
    sorry for spamming this. But i realized the problem was I converted the rotation matrix to 3X1 and passed it back to the function in the second loop.
    
    Reply
  3. Lin Zhang says:
    
    October 22, 2015 at 1:29 pm
    
    what is this “raux” you mentioned in the reply? Is it the rotation result rvec from the solvePNP?
    
    Reply
Sam Chuang says:

June 3, 2013 at 6:35 am

Just as a final update, it turns out that the error between using RPP and solvePNP is less than 1e-5 for both the rotational and translation matrices

Reply
1. nghiaho12 says:
  
  June 3, 2013 at 9:24 am
  
  Hey there,
  
  Seems like you got everything under control! Thanks for the doing a comparison between RPP and solvePNP, nice to know.
  
  Reply
Xavier says:

January 2, 2014 at 12:24 pm

Hi

Happy new year and thanks for the porting to C++, it’s quite clear how to use it.
I compiled it easily with visual studio 2013, but now I would like to use it in an Android app, and I need to compile it with the Android NDK, have you ever tried that? I get a lot of errors when doing so…

Reply
1. Xavier says:
  
  January 2, 2014 at 12:39 pm
  
  Well, I figured what was the problem. in RPP.h ObjPose is declared as
  void ObjPose(const cv::Mat P…….
  and in RPP.cpp as
  void ObjPose(const cv::Mat _P,……
  
  The compiler did not like the switch between _P and P.
  
  Reply
  1. nghiaho12 says:
    
    January 2, 2014 at 6:04 pm
    
    Ah, that’s useful to know!
    
    Reply
lgvictor says:

December 3, 2014 at 9:35 pm

hello, I have a doubt about the normalised image points using camera matricx . ’cause I can’t see the picture for it. Could you send me the picture in this page.Thanks.
lg.victor@163.com

Reply
1. lgvictor says:
  
  December 3, 2014 at 9:57 pm
  
  what is the nomalised points???
  
  Reply
  1. nghiaho12 says:
    
    December 4, 2014 at 1:21 am
    
    It’s just inv(K) * [x y 1], where K is the 3×3 camera matrix.
    
    Reply
    1. lgvictor says:
      
      December 4, 2014 at 1:43 am
      
      thank you
      
      Reply
Lin Zhang says:

October 22, 2015 at 3:04 pm

Hi, thanks very much for the code.
Originally I used cv::solvePNP for a planar pattern pose estimation. However, I found that in some cases (where the pattern face frontal parallel to the camera), the function returns unstable results from time to time. I presume there is an ambiguous case to be removed, so I applied Schweighofer’s paper for solving the issue. Essentially I applied your RPP code by using an initial guess calculated from the solvePNP. It still gives me similar unstable result, so the RPP is not able to remove the ambiguous case.

Do you have any suggestion for this?

Thanks very much!

Reply
1. nghiaho12 says:
  
  October 23, 2015 at 12:34 am
  
  Hmm, I don’t have any solution for this. Maybe other people here do.
  
  Reply
2. Chen says:
  
  February 16, 2016 at 9:30 pm
  
  Maybe the unstable solvePNP results give a bad initial value to to RPP thus confuse the latter algorithm. Please try to use the history value of RPP to do initialization.
  
  Reply
Long Phung says:

February 13, 2017 at 11:12 am

How can i fix it?
First-chance exception at 0x00007FFC16F17788 in Pose-Estimation.exe: Microsoft C++ exception: cv::Exception at memory location 0x0000003019AFABA8.
The thread 0x1f90 has exited with code 0 (0x0).
The thread 0xfd0 has exited with code 0 (0x0).
The thread 0x167c has exited with code 0 (0x0).
The program ‘[4656] Pose-Estimation.exe’ has exited with code 0 (0x0).

Reply
1. nghiaho12 says:
  
  February 13, 2017 at 7:18 pm
  
  Compile in debug mode and step through the code to see where it crashes.
  
  Reply
Abhinav Garg says:

March 22, 2017 at 9:10 am

I am also facing stability issues in the response of opencv solvepnp for real time head pose estimation. I have written a small simulator to test the behavior of solvepnp in various scenarios. I observed the stability issues are connected to the 3d coordinates and camera relative position. May be this is something to do with the condition of equations formed to solve for R and t. I am still trying to figure out some condition number for given 3d coordinates and estimated R and t estimate the reliability of result.

I think the issue can be resolved by scaling the 3d coordinates which in my case are in my control.

Can someone help me out here?

Reply
Farid says:

November 10, 2017 at 4:36 am

Hi,
Thank you for your excellent code. I am running your code in VS2015 and get the same output as you. To make sure my normalisation routine is correct, I used the same model_data and un_normalised_image_points : { 1, 3, 3, 1,-1, -1, -2, -2 }, where the first 4 data is for x and the next 4 for y co-ordinates. I use the normalisation formula which you have stated on the un_normalised_image_points, where HorizFOV=60, xRes=640 and yRes=480 (same as you). The result for the normalised image points is : { -0.5755,-0.5719,-0.5719,-0.5755, -0.4348,-0.4348,-0.4366,-0.4366} (I have truncated the results to 4 decimal places here for clarity).
I was expecting to get a result which shows no translations or rotations as my model points and un-normalised image points are exactly the same. However the routine output says the results do not look ok. I would appreciate any comments from you.

Reply
1. nghiaho12 says:
  
  November 13, 2017 at 12:09 am
  
  What rotation and translation do you get?
  
  Reply
2. nghiaho12 says:
  
  November 13, 2017 at 2:40 am
  
  That unnormalized image points looks very suspect. Where did that come from?
  
  Reply
Farid says:

November 13, 2017 at 2:04 am

rotation :
1.0 0.0 0.0
0.0 1.0 0.0
0.0 0.0 1.0

translation:
-320.0
-240.0
554.256258

Reply
Farid says:

November 13, 2017 at 10:11 am

Here I am using the exact same points for model-data and un-normalised image points, { 1, 3, 3, 1,-1, -1, -2, -2 }. I used the same values for model_data and the un-normalised image points, to check my output. Having used the same points for the two I was expecting to get identity matrix for rotation and zero’s for translation.
From your question I suspect I may have a wrong understanding of where the model-data points and un-normalised image points should come from. Could you please let me know from where should these values come from.
Could you please a give me sample of four or more points for model-data and un-normalised image points, for a case where the output should be no orientation and no translation.

Reply
1. nghiaho12 says:
  
  November 14, 2017 at 12:21 am
  
  You should never feed unnormalized image points into the RPP function, it expects all normalized values (image with focal length = 1 and (0,0) in the center of the image). Also, you can’t have “no translation” the object has to be in front of the camera. An example model for a business card would be the 4 corners, measured in millimeters (arbitrary choice), with (0,0,0) in the center. If you take a picture of this card, open the image in an image viewer and find the 4 corners, the coordinates are unormalized, where (0,0) is the top left of the image. If you apply inv(K) to each of those image points you get the normalized points, which you feed into RPP. If you take a picture of the card straight on, you’ll have no rotation, but will some have translation, in millimeters, from the camera.
  
  Reply
Farid says:

November 14, 2017 at 2:32 am

Thank you for your comments. So for the model-data points say for the business card example, we input the corners of the card measured in mm, where the centre of the card is at (0,0). For the un-normalised image points we enter the corners’ coordinates in pixels, where the top left of the image is (0,0). Then normalise the image points and feed them into RPP.

Reply
1. nghiaho12 says:
  
  November 30, 2017 at 11:43 pm
  
  Yup that’s right.
  
  Reply

Nghia Ho