Pose Estimation For Planar Target

This is an OpenCV port of Robust Pose Estimation from a Planar Target (2006) by Gerald Schweighofer and Axel Pinz using their Matlab code from the link in the paper. It is used to determine the pose of a planar target. It can be used for Augmented Reality to track a planar target, such as a business card.

The algorithm is very robust and does not require accurate calibration of the camera for Augmented Reality applications. I found that a rough guess for the FOV of my webcam worked.

Download

Last update: 12 March 2012
RobustPlanarPose-1.1.2.zip

Changes from last version

  • Added a check to see if the root finding code returned any valid solutions (thanks to csantos)

Requirements

  • OpenCV 2.x or later

Usage

On Linux, type make to compile the demo code and ./demo to see how it works. On Windows you have to make your own project.

Note: For Visual Studio users you might get a compiler error about undefined M_PI_2, please visit http://msdn.microsoft.com/en-us/library/4hwaceh6.aspx for a fix.

To use this code in your own program you only need to call the function

RPP:Rpp(const cv::Mat &model_3D, const cv::Mat &iprts, cv::Mat &Rlu, cv::Mat &tlu, int &it1, double &obj_err1, double &img_err1);

A description of the parameters are as follows:

model_3D are the fixed 3D points of the object you want to track, they should never change throughout the program. For example, say we have a square card we want to track, we can arbitrarily choose (-1,-1,0), (-1,1,0), (1,1,0), (1,-1,0) as the 4 corners point in 3D space. The values you choose will determine the reference frame for which the estimated pose will be relative to. Using the same example, (0,0,0) in this case will be at the centre of the square card.

iprts are the normalised image points. You will need to form a typical 3×3 camera matrix to normalise the points, which we shall call K. K typically looks like

K = \left[\begin{array}{ccc}fx & 0 & cx \\ 0 & fy & cy \\ 0 & 0 & 1 \end{array}\right]

fx/fy are the focal lengths (in pixels) and cx/cy are the optical centres. For Augmented Reality application, I find you can get away with rough approximations. For example, say the webcam has a horizontal FOV of about 60 degrees and the image size is 640×480. Then the 4 parameters are:

cx = 640/2 = 320
cy = 480/2 = 240
fx = fy = cx/tan(60/2 * pi / 180) = 554.26

Now that you have K, to normalise the points simply do

\left[\begin{array}{c} x^{\prime} \\ y^{\prime} \\ 1 \end{array}\right] = K^{-1} \left[\begin{array}{c} x \\ y \\ 1 \end{array}\right]

where x and y are the un-normalised image points

Rlu
is the returned 3×3 rotation matrix. If you pass in a valid 3×3 matrix the it will be used as the initial guess of the rotation. In a tracking scenario you would continuously feedback the returned Rlu as input for the next frame.

tlu is the returned 3×1 translation vector

it1 is the number of iterations done

obj_err1 is the object projection error in 3D

img_err1 is the image projection error, in normalised image point space

License

Simplified BSD License, unless superseded by a license from the original Matlab code or the copy and paste code for Rpoly.cpp.

82 thoughts on “Pose Estimation For Planar Target”

  1. hi

    I got this result when i run program

    Rotation matrix
    0.833827 -0.329443 -0.442944
    0.034358 -0.769870 0.637275
    -0.550955 -0.546596 -0.630620

    Translation matrix
    -0.115157
    1.284543
    11.195500

    Number of iterations: 29
    Object error: 0.020854
    Image error: 0.066171

    Failure. No convergence after 20 shifts. Program terminated.
    Input normalised image points (using 3×3 camera intrinsic matrix):
    0: (-0.0168, 0.0377)
    1: (0.0277, 0.0373)
    2: (-0.0824, 0.0386)
    3: (0.0317, 0.036)
    4: (-0.1015, -0.008)
    5: (0.0866, 0.1179)
    6: (0.1233, 0.1035)
    7: (0.0667, 0.1102)
    8: (0.0969, 0.166)
    9: (0.0622, 0.1608)

    Input model points (fixed 3D points chosen by the user, z is fixed to 0.0):
    0: (0.0685, 0.6383, 0.0)
    1: (0.4558, 0.7411, 0.0)
    2: (-0.7219, 0.7081, 0.0)
    3: (0.7061, 0.2887, 0.0)
    4: (-0.9521, -0.2553, 0.0)
    5: (0.4636, 0.0159, 0.0)
    6: (-0.101, 0.2817, 0.0)
    7: (0.6638, 0.1582, 0.0)
    8: (0.3925, -0.7954, 0.0)
    9: (0.6965, -0.7795, 0.0)

    Pose found by RPP

    Rotation matrix
    0.833820 -0.329449 -0.442954
    0.034357 -0.769873 0.637271
    -0.550967 -0.546588 -0.630617

    Translation matrix
    -0.115156
    1.284540
    11.195475

    Number of iterations: 33
    Object error: 0.020854

    what is this msg is about
    “Failure. No convergence after 20 shifts. Program terminated.”

    and how to get same results as that in matlab version

    1. That error is from a root finding algorithm that I got off the net. That error is probably dependent on the matrix backend that OpenCV uses. You got different numerical values to the ground truth. What OpenCV version are you using?

        1. Eigen is enabled by default in OpenCV. My output is:

          Pose found by RPP

          Rotation matrix
          0.857628 -0.311788 0.408977
          0.160469 -0.593308 -0.788819
          0.488593 0.742141 -0.458806

          Translation matrix
          -0.108252
          1.266009
          11.198547

          which is close to Matlab results. I don’t get any errors. My gcc version is:

          gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3)

          what compiler are you using?

        2. I have not tried it in Visual Studio. Maybe the default floating model behaves in an unexpected way. Try setting it to “strict” (somewhere in the optimization setting, floating point model) and see if that changes anything.

  2. i also got error when i complied first time is undefined symbol like M_PI_2 and i set _USE_MATH_DEFINES as preprocessor definition

      1. i tried with changing setting to strict but it gave me same results. i think my results are different because of this error
        Failure. No convergence after 20 shifts. Program terminated.
        you know how to resolve this

        1. This is what I get using Visual Studio 2010, 64bit mode.

          Pose found by RPP

          Rotation matrix
          0.857628 -0.311788 0.408977
          0.160469 -0.593308 -0.788819
          0.488593 0.742141 -0.458806

          Translation matrix
          -0.108252
          1.266009
          11.198547

          Number of iterations: 35
          Object error: 0.000471
          Image error: 0.066120

          Identical to what I got in Linux. I used a CLR visual studio project, with Common Language Runtime Support set to “No Common Language Runtime Support”, with your #define mentioned earlier for M_PI_2. And that’s it.

          1. i got lots of trouble in running CLR project and specially when i changed Common Language Runtime Support to “No Common Language Runtime Support”. Have you tried with standard win32 console application.

  3. Hey man, great job (assuming). I’ve been trying to run it on a VMWare’d instance of Ubuntu 11.04 with OpenCV 2.3.1 and gcc 4.5.2, but I always either get the convergence failure followed by a segfault or a glibc free() or fastbin memory error. Are any of these things you’ve run into or are they just problems on my end?

    1. I haven’t ran into them personally. But it sounds like there might be legitimate issues. I’m planning on cleaning the code up a bit and running it through Valgrind or something similar to find any bugs.

        1. Can you compile with the -g flag, edit Makefile. And run the program via gdb ./demo, type run, and see where it crashes.

  4. Hi Nghia, excellent work. I have a question. I have implemented your code in combination with colour segmentation based point tracking. I am tracking a 100x100mm square, with the corners indicated by coloured dots, using a HD webcam. I have defined the (planar) object as follows:

    Object model:
    // Object model
    // First row holds x-coordinates, second row holds corresponding y-coordinates
    double modelPoints_data[8] = { -0.05, 0.05, -0.05, 0.05,
    0.05, 0.05, -0.05, -0.05};

    .. Later ..

    // Insert data into RPP model input matrix
    for(int i=0; i < 4; i++) {
    modelPoints.at(0,i) = modelPoints_data[i]; // x
    modelPoints.at(1,i) = modelPoints_data[i+4]; // y
    }

    E.g. similar to your demo code. You see the square center is the origin of the coordinate system, and the points as follows:

    ___________
    | 1 2 |
    | |
    | |
    |_3______4_|

    With the program running, I output the modelPoints and imagePoints matrices (the latter holds the normalized image points, using a camera matrix equal to your example of 60 deg FoV, which seems to be accurate for this camera):

    objectPoints:
    -0.050000 0.050000 -0.050000 0.050000
    0.050000 0.050000 -0.050000 -0.050000
    0.000000 0.000000 0.000000 0.000000

    imagePoints:
    -0.049345 0.114326 -0.040100 0.123716
    -0.042747 -0.052067 0.119452 0.112352
    1.000000 1.000000 1.000000 1.000000

    The object is in this case straight in front of the camera, and I manually select the points by mouse in the order as they are defined in the object model. Thus I would expect near-zero x and y translation, say 0.3m z-translation, and an near-identity rotation matrix.

    The resulting translation vector seems to be very accurate according to initial measurements, however I am somewhat puzzled by the rotation matrix:

    Rotation matrix:
    0.994831 -0.056219 -0.084559
    -0.054371 -0.998232 0.024007
    -0.085760 -0.019286 -0.996129

    This is not approximately an identity matrix as I expected, as two of the diagonals are -1. The values correspond to a rotation of +- pi radians about the x-axis of the object. If I select the points in “wrong” order, that corresponds to this rotation, I do obtain the expected identity matrix.

    Do you have any thoughts on this?

    1. Hmmm, is this a mouse Y co-ordinate inversion issue? You have positive Y as pointing “up”, but screen co-ordinate points “down”.

      1. You are exactly right. Thank you. I solved this by changing


        // Insert data into RPP image input matrix
        for(int i=0; i < 4; i++) {
        imagePoints.at(0,i) = pointPositions[i][0]; // x
        imagePoints.at(1,i) = yMax-pointPositions[i][1]; // y

        // normalize image points
        imagePoints.col(i) = Kinv * imagePoints.col(i);
        }

        into


        // Insert data into RPP image input matrix
        for(int i=0; i < 4; i++) {
        imagePoints.at(0,i) = pointPositions[i][0]; // x
        imagePoints.at(1,i) = yMax-pointPositions[i][1]; // y; yMax-value displaces the origin to bottom-left instead of OpenCV's top-left

        // normalize image points
        imagePoints.col(i) = Kinv * imagePoints.col(i);
        }

        As indeed the OpenCV image origin is top-left and the camera origin is bottom-left. However one more question; after doing this the results are correct, but the iterations per pose estimation has gone up from 1 (for no rotation, i.e. camera directly in front of target), to the range of 200-700. As the no rotation case should be the easiest case to estimate, this is unexpected. What is the cause of this?

        I suppose it could be mended by performing the estimation inverted and then doing a rotation afterwards, but of course this would not be feasible.

        1. Try re-using your last rotation matrix found as the input into the RPP input. By default, if you give it an unitialised matrix, RPP will do a random initialisation to try and find the answer.

          1. Thanks for the tip. Indeed I did do that before. That did not help in this particular case; I suspect the ambiguity of the target (square) was the cause of the problem. Currently I’m using an unambiguous target, which results in only 1-20 iterations usually.

  5. Doctor Nghia,
    Thank you very much for the source code and explanation.

    I think you should post the source code to GitHub, it’s a great place to share source code (easy to view, update source code, easy for others to comment, contribute etc.):
    https://github.com/

  6. Hello friend,
    Under certain conditions, the error “No convergence after 20 shifts” seems to be inevitable. In such conditions, the program dies with a segfault error. My suggestion to correct this issue is by adding the following line:
    if(!bl.size()) return false;
    In the file RPP.cpp, line 882.

    Thanks

  7. Could you explain the demo a bit more? I’m curious as to where you got the values for the error calculation as well as the meaning of the pairs of numbers (0,1),(0,2),(2,1) etc. used in the error calculation. Also, if there was a normalization function for the image points, that would be awesome.

    1. Hi,

      The demo uses the original values from the Matlab code I ported. The demo tries to find the 3D rotation/translation for transforming model_data -> iprts_data (3D -> 2D). The model_data is assumed to be a plane lying on z=0.

      The error calculation is a simple comparison between the results from demo.cpp and the original Matlab values I got (ground truth). I’m just comparing each value in the matrix against the known answer.

      I just realised the comment “Using fabs instead of sum square, in case the signs are opposite” does not match what I actually did 🙂 Should have deleted that comment.

      1. I see, thanks.

        Do you think you could sanity check my normalization function?

        void normalizeImagePoints(double * pointVector, Mat & imagePoints, int pointCount,
        int screenWidth, int screenHeight, int FOV )
        {
        /*normalizedPoints = (double *)malloc(sizeof(double) * pointCount);*/
        double cx = screenWidth/2, cy = screenHeight/2;
        double fx , fy;
        fx = fy = cx / tan(FOV/2 * 3.14 / 180);

        Mat kMat = Mat( 3, 3, CV_64F);
        Mat kInv = Mat( 3, 3, CV_64F);
        kMat.at(0,0) = fx;
        kMat.at(1,1) = fy;
        kMat.at(0,2) = cx;
        kMat.at(1,2) = cy;

        cv::invert(kMat,kInv);

        for (int i = 0; i < pointCount; i++)
        {
        imagePoints.at(0,i) = pointVector[i * 2];
        //pointVector[i]; // * 2];
        imagePoints.at(1,i) = screenHeight – pointVector[(i*2)+1];
        //screenHeight – pointVector[i+pointCount];//(i*2)+1];

        imagePoints.col(i) = kInv * imagePoints.col(i);
        }
        }

        Not getting the expected identity vector (function based on Wesley’s comment)… however my issue is that the 1’s column seems transposed (as well as with negatives…):
        Rotation Matrix:
        0.000897 -1.000000 -0.000041
        -0.000166 -0.000041 1.000000
        -1.000000 -0.000897 -0.000166

  8. the version RobustPlanarPose-1.1.2.zip downloaded from above link,but there is no difference between 1.1.2 and version 1.1.1.
    are you sure that you have upload the version 1.1.2?

    1. The link was incorrect, it’s fixed now. There’s only a tiny change in RPP.cpp at line 883

      // Suggestion by csantos
      if(bl.empty()) {
      return false;
      }

  9. hmm noticed some problems in that code that I pasted actually (uninitialized stack space, the one not being set in the K matrix, etc.), however after fixing it, my results are still fairly similar to what they were previously… :\

    1. Your code looks about right. Check to make sure your imagePoints is initialised correctly so that the third tow is all 1.

  10. Hi Nghiaho,

    First of all, thanks for your great work! I’ve been looking for a robust pose estimation algorithm for a while now and yours works really well. I tried your demo and the results are identical to your output.

    However, I’m experiencing some problem when my planar target is perpendicular to the camera and appears square on the image. For this specific case, the resulting pose shifts between each frame. I have drawn a line perpendicular to the planar target using the extrinsic parameters and this line keeps moving between each frame even if the coordinates of the corners are almost still. Do you know what may cause this behaviour?

    I’m using OpenCV 2.1, Visual Studio 2008 Pro, Labview 2011 SP1 and a Thorlabs DCC1545M camera with a Theia SY110 corrected wide-angle lens

    Thanks

      1. Hi again,

        Unfortunately, I can only use 4 points to estimate pose, but I made several functions to improve the corner detection and it is very accurate. However, this behaviour occurs more often when the target is farther from the camera thus having less accuracy on the position. I thought the precision I had was enough but it probably isn’t.

        Thanks for your reply!

        1. For my augmented reality stuff there was a significant improvement as the number of points used increased. It’s a shame you can’t use more than 4.

          1. I have exactly the same problem, I’m using a square target and I was using only 4 points, I increased the number of points to 8 points; it is better but I still face the same problem, any clues?
            Another problem is I get very weird results when I don’t provide an initial pose, which I get from basic homography computation.

          2. The algorithm is probably very sensitive to noise. You want as much as points as possible. In my AR stuff I used as much point as possible for the homography calculation. Once I had a stable homography I used it to compute 4 corner points as input to the pose algorithm. I think I did this for speed reasons, because it’s cheaper to calculate the homography compared to the pose.

          3. Are you using square targets, and if you are did you try putting the camera orthogonal to the square target?

          4. I’ve tried with a rectangle business card and square CD case. Both work as long as there are enough points to match. Are your points normalised using the camera’s intrinsic matrix?

          5. Yes the points are normalized, and I use the camera intrinsic matrix after doing the camera checkboard calibration. But I’m wondering what are the points you are using other than the 4 corner points?

            P.S If I’m flooding you with comments, I’ll be glad if you contact me via email, and thanks for your support

          6. I’m using points within the plane itself. For example, a CD cover has an artwork with lots of visual feature points that can be used. What is your object?

            Replying via comments is fine, might help others in the process.

          7. I’m using a marker bold square with some code in it. So let me ask are you measuring the points in 3D then project it, in other words how do you find the 2D-3D mapping knowing that the tracked object can differ?

          8. I have an offline image of the CD cover stored in memory. The CD cover image was taken so it faces directly in front of the camera. I assign it some arbitrary 3D co-ordinates. When I do the initial 2D-2D match, I find where the 2D points land on the offline image and from this I can get a 2D-3D match.

          9. Ummm, so its not a 100% online approach. I’m not sure but do you think there’s a better way?

          10. I meant in your approach you were using a previously captured image. I was thinking may be I can use a markerless tracking system along with a marker-based one and try to find the 3D mappings

  11. Dear Nghia,

    Great work! That helps me a lot for implementing an AR system.

    Still, I am stuck at a rendering problem about how to set the OpenGL modelview matrix using the estimated rotation and translation matrix from RPP. It seems the signs of some values need to be changed to negative (in your code commented with “bit of guess work”), but I don’t know why and which exactly terms to change. Could you help to explain that ?

    Thank you very much!

    Best,
    Arthur

    1. I’ve got code in the MarkerlessARDemo that does exactly what you want. Download the code and look for this bit in ARPanel.cpp:

      // Grab the translation/rotation information
      m_t = g_AR.GetTranslation();

      // NOTE the negative -z, OpenGL postive z is out of the sceen
      glTranslatef(m_t.at(0,0), m_t.at(1,0), -m_t.at(2,0));

      // Apply camera rotation to the model
      m_rot = g_AR.GetRotation();
      m_rot = g_AR.CorrectRotationForOpenGL(m_rot);
      MultRotationMatrix(m_ro );

      // Draw our teapot model

  12. What are the required changes if I want to use the image points in pixels not the normalized ones?
    Another question, I don’t see in your code anything about the intrinsic parameters of the camera, or am I missing something?

    1. I don’t think that algorithm can work without normalized points. I’ve found you don’t need an accurate intrinsic matrix to get usable results. When I used it on my webcam I just took a guess and never bothered to confirm it via checkboard calibration. The intrinsic parameter is the K matrix mentioned on the page.

  13. Hi Thanks for the great work.
    I came across this algorithm and want to use it to replace the use of opencv’s builtin solvePnP to calculate the camera’s rotational and translation matrices based on corners of detected markers. Then I’m using the two matrices to project some 3d points back to the image plane. It worked fine with using solvePnP but I get some jittering effects and that’s why I wanted to try this out.

    The problem when I started using this algorithm is that I get really odd solutions:
    data for Rpp
    [0, 0, 1, 1;
    0, 1, 1, 0;
    0, 0, 0, 0]
    [141.5075073242188, 148.9447631835938, 265.8343505859375, 272.9588012695313;
    379.0839538574219, 266.201904296875, 264.2275085449219, 371.3097839355469;
    1, 1, 1, 1]

    (vector of point3f)
    data for solvepnp
    [0, 0, 0; 0, 1, 0; 1, 1, 0; 1, 0, 0]

    (vector of point2f)
    [141.50751, 379.08395; 148.94476, 266.2019; 265.83435, 264.22751; 272.9588, 371.
    30978]

    result from Rpp
    Rotation matrix
    [0.9986703642011102, -0.05155098095899455, -5.346740956264395e-006;
    -0.05155097733117901, -0.9986703235488186, 0.0002856555596111147;
    -2.006545583506777e-005, -0.0002850001120310718, -0.999999959186156]

    Translation matrix
    [1.211351025315951; 3.136464760403361; 0.008283444467592525]

    result from solvePnP
    Rotation matrix
    [2.56306007280152; -0.0845208981807476; 0.2373663176224296]
    [-1.363359203478092; 1.056087155784301; 4.236560203317223]

    Translation matrix
    [4.61047e+019, 1.18111e+020]

    I’m wondering if it’s because of using this algorithm incorrectly?

    1. Sorry, I didn’t convert the result of raux back to 3×3 matrices. I redid the calculation and is shown below.

      data for Rpp
      [0, 0, 1, 1;
      0, 1, 1, 0;
      0, 0, 0, 0]

      [218, 237.3515777587891, 420.9180297851563, 447.0541687011719;
      422, 245.8782348632813, 244.9852905273438, 426.8499755859375;
      1, 1, 1, 1]

      data for solvepnp
      [0, 0, 0; 0, 1, 0; 1, 1, 0; 1, 0, 0]

      [218, 422; 237.35158, 245.87823; 420.91803, 244.98529; 447.05417, 426.84998]

      result from Rpp
      [0.07548107378172445, 0.9971429198010968, -0.002933426546021601;
      -0.9971421371442634, 0.07547095480045923, -0.0034195486365947;
      -0.003188390209630636, 0.003183154418194693, 0.9999898507964076]

      [1.154120170651896; 2.191182270703893; 0.005600938552510155]

      result from solvePnP

      [-0.79416007, -0.59182167, -0.1380467;
      0.39268467, -0.67311519, 0.62666953;
      -0.46379793, 0.44346708, 0.7669605]

      [-0.45007941; 0.80028373; 2.418669]

      1. Okay, never mind I realized the problem was not normalising the image points. I fixed the problem by normalizing the points using undistortPoints with camera matrices.

        I ran into another problem with RPP. It works fine the first time I call the function, but when I call the function again with a previous Rotational matrix, I get the following errorr

        OpenCV Error: Assertion failed (a_size.width == len) in unknown function, file .
        .\..\..\src\opencv\modules\core\src\matmul.cpp, line 718

      2. sorry for spamming this. But i realized the problem was I converted the rotation matrix to 3X1 and passed it back to the function in the second loop.

  14. Just as a final update, it turns out that the error between using RPP and solvePNP is less than 1e-5 for both the rotational and translation matrices

    1. Hey there,

      Seems like you got everything under control! Thanks for the doing a comparison between RPP and solvePNP, nice to know.

  15. Hi

    Happy new year and thanks for the porting to C++, it’s quite clear how to use it.
    I compiled it easily with visual studio 2013, but now I would like to use it in an Android app, and I need to compile it with the Android NDK, have you ever tried that? I get a lot of errors when doing so…

    1. Well, I figured what was the problem. in RPP.h ObjPose is declared as
      void ObjPose(const cv::Mat P…….
      and in RPP.cpp as
      void ObjPose(const cv::Mat _P,……

      The compiler did not like the switch between _P and P.

  16. Hi, thanks very much for the code.
    Originally I used cv::solvePNP for a planar pattern pose estimation. However, I found that in some cases (where the pattern face frontal parallel to the camera), the function returns unstable results from time to time. I presume there is an ambiguous case to be removed, so I applied Schweighofer’s paper for solving the issue. Essentially I applied your RPP code by using an initial guess calculated from the solvePNP. It still gives me similar unstable result, so the RPP is not able to remove the ambiguous case.

    Do you have any suggestion for this?

    Thanks very much!

    1. Maybe the unstable solvePNP results give a bad initial value to to RPP thus confuse the latter algorithm. Please try to use the history value of RPP to do initialization.

  17. How can i fix it?
    First-chance exception at 0x00007FFC16F17788 in Pose-Estimation.exe: Microsoft C++ exception: cv::Exception at memory location 0x0000003019AFABA8.
    The thread 0x1f90 has exited with code 0 (0x0).
    The thread 0xfd0 has exited with code 0 (0x0).
    The thread 0x167c has exited with code 0 (0x0).
    The program ‘[4656] Pose-Estimation.exe’ has exited with code 0 (0x0).

Leave a Reply

Your email address will not be published. Required fields are marked *