Structure from motion using Bundler + CMVS/PMVS

I’ve rolled out my own structure from motion package, called RunSFM, that aims to make it simple to get Bundler and CMVS up and running on Linux. And by simple I mean, a single make command to compile and a single call to to do the entire process, no arguments required. Okay, the catch is you do have to install some packages beforehand, all of which are on Synaptics in Ubuntu 11.04. The uncommon ones I’ve included for convenience.

RunSFM also includes speed optimisations I’ve made, like speeding up PMVS by up to 40% and parallel SIFT extraction and matching. No GPU is used, so most people should be able to use it. I decided not to use SiftGPU for now because I had a test case where it failed to find enough good matches compared to the SIFT binary by David Lowe.

You can check RunSFM out via Computer Vision -> Structure from Motion

Update on Markerless Augmented Reality stuff

Haven’t posted up anything for the past weeks. Been busy working on this Markerless AR demo code, which should hopefully be released soon.

Recently. there have been a lot of interest in getting AR working on mobile phones (dubbed MAR for Mobile Augmented Reality). The results from papers I’ve found have been quite impressive. But I thought it would be just as fun going the other way and trying to get it running as fast as possible on a powerful PC. This allows the user to do more intensive processing and yet run in real-time. Complex 3D models, animation, particle effects, sound, and maybe even some spare cycles for a frame of Crysis … thats the idea anyway.

To do this I’ve been using the GPU where possible. So far the GPU is used to: extract patch features, build the feature descriptor, match features using a simple K-Tree structure on the GPU.

I’ve spent some time trying to put the Homography calculation on the GPU with poor results, probably because my card does not support double types, which may be necessary for SVD calculations. My laptop’s graphics card is a Nvidia GTS 360M. I can get 20-30 fps easily. I’d imagine with the latest and greatest Nvidia card on a desktop PC, it can only go higher.

In the mean time, here’s a recent screenshot. The room was poorly lit so the webcam ran only like 5 fps. But the processing, at the current time, was capable of 33 fps or so. I’m hoping to increase that with some further GPU optimisation. I’d also like to point out it’s 33 fps WITHOUT any ‘real’ tracking code. I’ve only used a bounding box from previous matched frames to limit the feature search. It’s pretty much running in semi-detection mode all the time. One of the main reason why I didn’t put in a tracker yet is because my room is poorly lit, which makes the webcam very susceptible to blurring. The blurring will easily kill any tracker. So it would be kinda moot to test.

I also got rid of the teapot and replaced it with a 3D model of Tux! which turned out to be a pain in the !@#$ a$$ to get up and running, but I’ll leave that rant for another post.

Markerless Augmented Reality
Markerless Augmented Reality Demo App on Nvidia GTS 360M (laptop GPU)



Update on RANSAC Homography on the GPU

Updating on my previous post, I found a tiny bug when calculating the inliers. Still don’t know why the normalisation code produces less inliers though. So with the fixed code the inlier results without normalisation for Kyral castle and the house are 437 and 158 respectively, instead of 402 and 151 as reported before. The normalisation and bias random flag can be toggled in CUDA_RANSAC_Homography.h using the #define at the top. The new code can be downloaded from the same link as before:

CUDA_RANSAC_Homography.tar.bz2 (1 MB)

The code was written on Linux using CodeBlocks. If you don’t have both it should not be difficult to manually create a project in the IDE of your choice. If you try to compile on Windows you will have to get rid of the Linux specific timing code.

RANSAC Homography on the GPU using CUDA

UPDATE: 28 June 2011

It turns out the SVD code I used does not work particularly well if the image undergoes significant rotation. I have experimented with porting GNU GSL Jacobi SVD code and still no go. My GPU does not support double type, which I suspect is fairly important for SVD calculations. Oh well, guess I’ll have to wait till I get my hands on a newer Nvidia card.

This week I thought it would be interesting to see how well the popular 2D Homography using RANSAC algorithm would perform on the GPU. My first hunch was that it should work well because each RANSAC iteration is completely independent, at least for the vanilla implementation. I decided to write a simple test program that would find the homography between two images and output the matching points. The program does the following:

1. Extract FAST corners in both images (using OpenCV)
2. Use the GPU to do a simple brute force match using the Sum of Absolute Difference (SAD) of each feature patch (16×16 block).
3. Use the GPU to do the RANSAC homography calculation and compare with OpenCV’s cv::findHomography function.
4. Output matches

Things sounded simple enough until I realised I eventually needed an SVD function (for calculating the homography) that could be called in a CUDA kernel. I ended up using C code from:

with some modifications to make it work in CUDA. I also remembered my GeForce 360M GTS does not support double types in CUDA code, so things are a bit unclear in regards to the reliability and accuracy of the SVD results. At this point I just crossed my fingers and hoped for the best, like any good engineer would do.

Results and discussion

I used two image pairs to test the program. Both pairs were originally taken for panoramic stitching. The camera was rotated about the centre (no significant translation), which makes using the homography possible for this example.  Below shows the final results of each pair.

Kryal castle



Timing results for Kryal castle:

Load images: 27.847 ms
Convert to greyscale: 2.556 ms
FAST corners: 7.743 ms
Feature matching (GPU): 818.109 ms
<strong>RANSAC Homography (OpenCV): 24.791 ms
RANSAC Homography (GPU): 10.149 ms</strong>
Refine homography: 1.011 ms
Total time: 892.3 ms

Features extracted: 1724 1560
OpenCV Inliers: 518
GPU Inliers: 402
GPU RANSAC iterations: 2875

RANSAC parameters:
Confidence: 0.99
Inliers ratio: 0.2
Data is normalised: no
Bias random selection: yes

Homography matrix
1.31637 -0.0165917 -304.696
0.108898 1.22028 -73.6643
0.000389038 4.51187e-05 1

And timing results for the house image:

Load images: 30.393 ms
Convert to greyscale: 2.481 ms
FAST corners: 6.126 ms
Feature matching (GPU): 529.514 ms
<strong>RANSAC Homography (OpenCV): 82.173 ms
RANSAC Homography (GPU): 8.498 ms</strong>
Refine homography: 0.607 ms
Total time: 659.896 ms

Features extracted: 1635 1011
OpenCV Inliers: 195
GPU Inliers: 151
GPU RANSAC iterations: 2875

RANSAC parameters:
Confidence: 0.99
Inliers ratio: 0.2
Data is normalised: no
Bias random selection: yes

Homography matrix
1.58939 0.119068 -438.437
0.187436 1.43225 -117.053
0.000693531 0.000185465 1

The GPU code is faster than OpenCV’s cv::findHomography function, which is always a good start to the day. The speed improvement varies (2x to 10x) because OpenCV’s RANSAC code will adapt the number of iterations depending on the highest number of inliers it has found at the current time. The maximum iteration is capped at 2000. Here’s a snippet of OpenCV’s RANSAC parameters from opencv/modules/calib3d/src/fundam.cpp line 222 onwards:

const double confidence = 0.995;
const int maxIters = 2000;

There are some obvious and some less so obvious parameters I have used in my code that I need to explain. My RANSAC code uses a fixed number of iterations based on confidence and expected inlier ratio, based on the theoretical equation from the Wikipedia. I set the inlier to 0.2, which means I’m expecting 20% of the data to be good and the remaining 80% to be bad/noise/crap. I’m playing it safe here since I made no attempts to do any filtering during the feature matching step. In a real application you would, because any bad data that can be pruned out early will make RANSAC jobs easier.

The next parameter is data normalisation when calculating the homography. According to the book Multiple View Geometry, aka ‘bible’, you should never skip this stage, because doing so will unleash the gates of hell. So out of curiosity I made it an option that I can turn off and on to see the effect. For some reason I get more inliers with it turned off?! Which just ain’t right … I’ve checked the code over and over and have not been able to figure out why that is. My normalisation code is the same one described in Multiple View Geometry. Below are the results with and without normalisation for the test images:

No. Inliers with normalisation No. Inliers without normalisation
Kryal castle 320 402
House 44 151

It would be good to test the same code on a newer GeForce that supports double type to see if the results are different.

The last parameter, ‘Bias random selection’, was a simple and quick idea that I threw in, hoping it would improve the RANSAC point selection process. The idea is to bias the random selection to pick point pairs that have a stronger matching score. I got the idea when I was reading about the PROSAC algorithm, an improvement over RANSAC when extra information is available. I used the SAD score of each feature pair from the feature matching stage and transformed it so a good match has a high score. In the beginning I saw improvements in the number of inliers, but after adding the homography refinement code, it gave the same results with or without. Still, I like the idea of having there, it gives me a warm and fuzzy feeling at night.


From the brief playing around with the code I wrote, I can safely say that a GPU version of RANSAC for homography calculation looks like a real winner. The speed improvement is significant enough to justify using it in the real world. On my laptop, the GPU can do over 300,000 RANSAC homography iterations a second. I’m hoping to use it on my old Markerless Augmented Reality project to get real-time speed without trying too hard.

A point worth mentioning, if your data for some reason has a lot of noise and requires more than 2000 iterations to find a usable solution then OpenCV’s cv::findHomography will probably fail. I find it odd it’s not parameter that the user can change.



The project can be compiled in CodeBlocks. The project assumes CUDA is installed at /usr/local/cuda. If not, you will need to right click on each *.cu file, click Properties, Advanced and change the custom command.

The two image pair I used can be found in the samples diretory. The program’s usage is:

CUDA_RANSAC_Homography [img.jpg] [target.jpg] [results.png]

Simple Pano Stitcher

In the past I’ve received emails from people on the OpenCV Yahoo group asking for help on panoramic stitching, to which I answered them to the best of my knowledge, though the answers were theoretical. Being the practical man I like to think of myself,  I decided to finally get down and dirty and write a simple panoramic stitcher that should help beginners venturing into panoramic stitching. In the process of doing so I learned some new things along the way. I believe OpenCV is scheduled to include some sort of panoramic stitching engine in future release. In the meantime you can check out my Simple Pano Stitcher program here:

Last update: 31/03/2014


It uses OpenCV to do fundamental image processing stuff and includes a public domain Levenberg-Marquardt non-linear solver.

To compile the code just type make and run bin/Release/SimplePanoStitcher. It will output a bunch of layer-xxx.png images as output. I’ve included sample images in the sample directory, which the program uses by default. Edit the source code to load your own images. You’ll need to know the focal length in pixels used to capture each image.

To keep things simple I only wrote the bare minimum to get it up and running, for more information check the comments in main.cpp at the top of the file.

Results with Simple Pano Stitcher

Below is a sample result I got after combining all the layers, by simply pasting them on top of each other. No fancy blending or anything. On close inspection there is some noticeable parallax. If I had taken it on a tripod instead of hand held it should be less noticeable.

Sample results from Simple Pano Stitcher
Sample results from Simple Pano Stitcher


One of the crucial steps to getting good results is the non-linear optimisation step. I decided to optimise the focal length, yaw, pitch, and roll for each image. A total of 4 parameters. I thought it would be interesting to compare results with and without any optimisation. For comparison I’ll be using the full size images (3872 x 2592), which is not included in the sample directory to keep the package size reasonable. Below shows a close up section of the roof of the house without optimisation.

Panoramic stitching without non-linear optimisation

The root mean squared error reported is 22.5008 pixels. The results are without a doubt awful! Looking at it too long would cause me to require new glasses with a stronger optical prescription. I’ll now turn on the non-linear optimisation step for the 4 parameters mentioned previously. Below is what we get.

Panoramic stitching with optimisation
Panoramic stitching with non-linear optimisation

The reported root mean squared error is 3.84413 pixels. Much better! Not perfect, but much more pleasant to look at than the previous image.

There is no doubt a vast improvement by introducing non-linear optimisation to panoramic stitching. One can improve the results further by taking into consideration for radial distortion, but I’ll leave that as an exercise to the reader.

I originally got the focal length for my camera lens from calibration using the sample program (calibration.cpp) in the OpenCV 2.x directory. It performs a standard calibration using a checkerboard pattern. Yet even then its accuracy is still a bit off. This got me thinking a bit. Can we use panoramic stitching to refine the focal length? What are the pros and cons? The images used in the stitching process should ideally be scenery with features that are very very far, so that parallax is not an issue, such as outdoor landscape of mountains or something. Seems obvious enough, there are probably papers out there but I’m too lazy to check at this moment …

OpenCV port of ‘Robust Pose Estimation from a Planar Target’

Please see this page for the most up to date information regarding this post and code.

A few months ago I wrote an OpenCV version of the pose estimation algorithm found in the paper Robust Pose Estimation from a Planar Target (2006) by Gerald Schweighofer and Axel Pinz using their Matlab code from the link in the paper.

I originally ported it so I could play around with Augmented Reality, specifically to track a planar maker (business card). Although that project hasn’t received much attention lately I figured someone might find the pose estimation part useful. I  was about to post it up last night when I came across a small bug, which took 2-3 hour to hunt down. Seems like the new OpenCV Mat class has some subtle gotchas but all good now.I checked my results with the original Matlab code and they appear numerically the same, which is always a good sign.


Look at demo.cpp to see how it works.

UPDATE: 12 March 2012

If you plan to use this for Augmented Reality applications then you should modify Line 23 in RPP.cpp from:

ObjPose(model_3D, iprts, NULL, &Rlu, &tlu, &it1, &obj_err1, &img_err1);


ObjPose(model_3D, iprts, &Rlu, &Rlu, &tlu, &it1, &obj_err1, &img_err1);

This initialise the guess of the rotation matrix to the last estimate, providing smoother and more accurate results.