Markerless Augmented Reality

This project is no longer maintained. Have a look at NAR Demo instead.

I’ve recently got interested in augmented reality and decided to write one up to learn how it works. Currently the system can track a reasonably feature rich planar object. Some technical features of the current system:

The program runs quite well on my laptop’s webcam but I’m unable to move the planar object quickly due to lots of motion blur in my dimly lit room. For this reason I have not yet implemented a feature tracker to speed/improve pose estimation. Nonetheless it runs quite good without it.


Last update 11 August 2011


Changes from last version

  • Bug fixed in RPP.cpp, incorrect matrix indexing
  • Renamed misleading ExtractNormalisedPatch() function to GetMeanSubtractedPatch()


  • Linux (anything recent is fine, I used Ubuntu)
  • CUDA supported NVIDA graphics card
  • CodeBlocks IDE
  • wxWidgets development libraries
  • Nvidia CUDA libraries installed at /usr/local/cuda
  • OpenCV 2.x or later (I used the SVN version)
  • Webcam/camera supported by OpenCV

To compile, open up the project in CodeBlocks. To run, type in the project directory:


You have to run it from the project directory, because it relies on other directories relative to this.

If you want control over how the CUDA .cu files are compiled, right click on the file, Properties -> Advanced. By default it will compile for CUDA 1.2 capable devices.

The code is almost platform independent, only a Linux specific timing code is used. Meaning it should be possible to compile this on Windows with some patience.


I tried to make the interface as easy to use as possible. Instructions are displayed where needed, so hopefully using it will be obvious.

Importing Blender models

The Wavefront obj loader is still very alpha. I have not tested it on a variety of models, so don’t be surprise if things break! To use your own model export it as a Wavefront obj using the following options:

Make sure you manually select the model of interest beforehand, else it will export an empty file. The export options have to be selected EXACTLY as shown, at least for now. Blender will generate a .obj and .mtl file. I haven’t added a clean interface to load in 3D models (next on my TODO list). So for now just overwrite Tux.obj and Tux.mtl in the models directory for now. The original model of Tux is from


Markerless Augmented Reality
Markerless Augmented Reality Demo App

38 thoughts on “Markerless Augmented Reality”

  1. Hi,
    I’m a computer science student and I was making an iPhone application where I need markerless augmented reality.
    Could you post the source code or send it to me.
    Thank you.

    1. The code is in a bit of a mess atm. I want to post it when I clean it a bit. But it’ll be a few days because I have an arm injury, so coding is slow 😐

      1. well thanks for your reply and I hope you get well soon.
        I also was wondering how to track FAST corners using in OpenCV.
        Thank you.

        1. You can use the KLT algotirhm for tracking in OpenCV. But for AR I think you might be better writing your own one using block matching + position prediction.

  2. Hey that’s a great achievement actually, just wondering what computer specification you are using (able to achieve 15-20fps is quite remarkable) and you didn’t use any descriptor? FAST extracts keypoints only right?

    1. Hi,

      I’m using a laptop with an Intel i7 1.6GHZ. The video is coming from the built in webcam, which runs at 640×480. I’m currently playing around with a multi-threaded + GPU version that I hope to release soon, just to see how much faster I can get it.

      The descriptor is based on the paper ‘High speed feature matching by Simon Taylor and Ed Rosten’, which I modified to be much simpler and faster to match, not sure about robustness though. Each FAST corner ends up being described by 64 binary values (8 bytes).

      1. Multithreading with augmented reality? Sorry I am still doing my undergraduate programme so i don’t quite get it how is it possible to multi-thread the detection, description and matching code? I mean aren’t they not serializable since they are sequential tasks?

        1. The multithreading part might have been a bit misleading 🙂 In reality all I’ve done so far is made a thread to capture video (buffer if required), another thread to do the AR processing, and another thread to do the render the graphics.

          The detection, descriptor and matching code is done sequentially. But each of those individual tasks can actually be run in parallel, which is suitable for the GPU.

          1. Haha i thought so. But all three threads somehow have to wait for the AR thread right? So the parallel part did you try using Intel’s TBB?

        2. Yep the threads do have to wait. I was contemplating writing the parallel part on the CPU awhile ago using OpenMP, since it’s quick and easy, but never got around to it.

  3. Hi!

    Nice work! I’m starting to look into markerless AR too 🙂 Are you planning to make your code available anytime soon?

    1. Hi,

      Good timing, I just updated this page and uploaded the code! Still a bit academic’ish, requires compilation and installing a bunch of libraries and a Nvidia graphics card.

      Let me know if you run into problems.

  4. Hi,I just working on a AR project using FAST algorithm.Your work is wonderful,thanks for your share for your code.
    While reading the Simon’s paper about the robust feature matching,I am confused with the section about the description of the interest point .what is the method you are using to build the database for the training reference image.

    1. The way I encode the feature is different to Simon’s paper, the sampling is the same though. At the feature (x,y) location I place a 16×16 square window over it and sample every second pixel. Since it’s an even window it wont centre perfectly, but that’s okay. This results in 8×8 pixel values sampled. I then take the average value of the patch, and for every pixel greater than the mean I encode it as “1” and “0” otherwise. This ends up as a 64 bit values, which can be stored (8 bytes) and matched efficiently using bit-wise operations.This technique works well for FAST features because there is usually a high contrast in the pixels at the detected location.

      1. Thanks for your respond.Now I have got all the interest points in train image,and the 8*8 sparsely square window (method mentioned in Simon’s paper).while the next step is similar with yours,take the average of the 64 samples,but each sample has 5 bit to describe itself.As you know,each sample with size 8*8*5/8=40byte.Actually I am not sure how to organize all the samples for one interest point,and how to organize all the interest points in one image.
        Could you please tell the method implementation in your work,and which file to calculate the training image.
        Best regards.

        1. The main code is in AR/AR.cpp, function void AR::ExtractModelFeatures()

          Line 249 is where it grabs the patch and stores in the array normalisedPatch.

          if(!ExtractNormalisedPatch(m_model_warped_grey, x, y, normalisedPatch, orientation)) {v

          By the way, the function name is misleading, it was something from an older experiment. It should just be called ExtractPatch or GrabPatch 🙂 I’ll fix this in a later release.

          That function will also grab the pixels and subtract the pixel values by the mean (unnecessary though).

          Then down at line 272, is where it decides which pixel value should be “1” or “0”. I got a class call “Bit”, which handles bit manipulation. For testing/debugging you can alternatively store it as a char[64].

  5. Hello,in Simon’s paper a training set of source image would be builded by “7 different
    scale ranges and 36 different camera axis rotation ranges were used, giving a total of 252 viewpoint bins.”. In your code, Could you please tell in your work how to build the training image. Thank you very much!

    1. Hi,

      I used a similar approach. I used 6 scale ranges (2 octave levels and 3 inter-octave level), 4 out-of-plane rotations for the two X/Y axis (so 8 of these), In total 48 viewpoint bins. In Simon’s later paper he proposed a rotation invariant version of his signature, which I use as well. In my code AR/AR.cpp, function ExtractModelFeatures() does this part. It uses OpenCV to to do an affine warp on the feature patch.

      1. Thank you very much!
        I face one new problem, when I saved the “Snapshot” image, this image from camera was completely black, no matter what format. I find the code in CaptureTargetPanel.cpp, what’s meaning about the code ”
        // Opengl needs width multiple of 4
        int width = model.cols;
        width = (width/4)*4 + 4;”in function “OnCropTargetBtnClick” ???

        1. frame from camera show in CaptureTargetPanel is OK. It is strange that clicking “Take snapshot” button and then the save image is completely black. Why?

          1. For now try saving the image of the model using a different software perhaps. Or pressing print screen and cutting the model out using MS Paint or Photoshop. Then load the model image from the main screen. Are you using OpenCV 2.2.x? Maybe try 2.3.0 if the problem persists.

        2. That rounds the image width to nearest multiple of 4. If you get an image of 123×123, it will not load into OpenGL properly. The width needs to round up to 124×123.

          1. Of course I clicked the 2 coners in camera frame.
            My camera is logitech QuickCam C100.
            Should I increase the value in frame buffer ( Default value is only 10)?

  6. programme error information:

    Max hit: 0
    Total poses in the database: 84
    KTree 0

    Cuda error in file ‘/home/mcc-lxy/MarkerlessARDemo/AR/’ in line 96 : invalid texture reference.

  7. Of course I clicked the 2 coners in camera frame.
    My camera is logitech QuickCam C100.
    Should I increase the value in frame buffer ( Default value is only 10)?

    1. You shouldn’t increase the buffer. It won’t help if your computer can’t process the buffer queue fast enough.

  8. line 225 and 475 in AR/AR.cpp, WarpModel(scale, 0.0, xscale, yscale) is used to warp the model image. But I found the 2nd parameter in function WarpModel always equal to zero, why? And I read code to find “xscale = m_out_of_plane_rotations[0~3]=0,30,45,60″, I think these data is angle value other than scale value. This code”,0) = xscale*scale;” is what mean. Would you please explain it to me,thank you very much!

    1. The 0.0 parameter is rotation. I originally implemented Simon’s older version, where he warped the image for different rotations. Later he wrote a rotation invariant version. I implemented that version, so I no longer needed to generate images for different rotation, that’s why it’s zero all the time.

      I wrote a quick PDF explaining out-of-plane rotation.

      Hopefully it’ll explain everything.

  9. Hi,
    I’m new in AR and I;m currently work on tis as well. I’m developing under window and I wanna know is it possible to use ur code with computer that doesn’t have NVIDIA graphic card because when I gone through ur code, u are using CUDA and in my knowledge CUDA is for development under NVIDIA graphic card.


    1. Hi,

      It is possible to use my code without CUDA. CUDA is used at the moment to build the feature descriptor and do matching. There is already a CPU only version that does the feature descriptor part, on startup when the features are extracted from the model. To do the fast matching, you’ll probably have to use OpenCV Flann. I’ve been meaning to update the code to include a #define to people with/without CUDA, but haven’t gotten around to it.

  10. Hi, I am also working on Markerless Augmented Reality. However I used SIFT/SURF features, and RANSAC to estimate homography. Then I need to get the OpenGL modelview transformation matrix so I can start AR, from the 3×3 homography matrix. Which part of your code starts to estimate two transformation matrices for OpenGL: a modelview matrix (which I figure out above, based on the camera’s EXtrinsic properties) and a projection matrix (which is based on the camera’s INtrinsic properties)? Could you please help me for this handicap.

    1. Hi,

      The code you’re looking for starts at line 699 in AR.cpp. It calls RPP::Rpp to estimate the pose. The results are returned in m_model_rotation_mat and m_model_translation_mat. Using those 2 matrix, I plug it into OpenGL, which occurs at line 360 in ARPanel.cpp.

    1. The code is on this page. It’s not suitable for mobile application because I use CUDA, which requires an NVIDIA card. You can however try and adapt the code for mobile.

Leave a Reply

Your email address will not be published. Required fields are marked *