Added a dedicated page for the CUDA RANSAC Homography code, under the Computer Vision section.
I’m in the process of restructuring the website a bit, moving some projects posted on the main blog page to dedicated pages.
I’ve updated the CUDA RANSAC Homography code. It now uses OpenCV’s SURF GPU implementation instead of FAST. The CUDA kernel based SVD code has been replaced by a port of GNU GSL’s SVD function. The CUDA SVD code is generic enough that you can easily include it into an existing project, but probably limited to small matrices. You can download it here CUDA_RANSAC_Homography_1.1.tar.bz2.
I’ve rolled out my own structure from motion package, called RunSFM, that aims to make it simple to get Bundler and CMVS up and running on Linux. And by simple I mean, a single make command to compile and a single call to RunSFM.sh to do the entire process, no arguments required. Okay, the catch is you do have to install some packages beforehand, all of which are on Synaptics in Ubuntu 11.04. The uncommon ones I’ve included for convenience.
RunSFM also includes speed optimisations I’ve made, like speeding up PMVS by up to 40% and parallel SIFT extraction and matching. No GPU is used, so most people should be able to use it. I decided not to use SiftGPU for now because I had a test case where it failed to find enough good matches compared to the SIFT binary by David Lowe.
You can check RunSFM out via Computer Vision -> Structure from Motion
I’ve finally uploaded the Markerless AR stuff I’ve been working on for the past few months or so. Still more work to be done. At the moment it’s just meant to be a demo application that you can play around with. I don’t have any immediate plans to polish it up and package it as an API or anything. You can download it from this page.
Haven’t posted up anything for the past weeks. Been busy working on this Markerless AR demo code, which should hopefully be released soon.
Recently. there have been a lot of interest in getting AR working on mobile phones (dubbed MAR for Mobile Augmented Reality). The results from papers I’ve found have been quite impressive. But I thought it would be just as fun going the other way and trying to get it running as fast as possible on a powerful PC. This allows the user to do more intensive processing and yet run in real-time. Complex 3D models, animation, particle effects, sound, and maybe even some spare cycles for a frame of Crysis … thats the idea anyway.
To do this I’ve been using the GPU where possible. So far the GPU is used to: extract patch features, build the feature descriptor, match features using a simple K-Tree structure on the GPU.
I’ve spent some time trying to put the Homography calculation on the GPU with poor results, probably because my card does not support double types, which may be necessary for SVD calculations. My laptop’s graphics card is a Nvidia GTS 360M. I can get 20-30 fps easily. I’d imagine with the latest and greatest Nvidia card on a desktop PC, it can only go higher.
In the mean time, here’s a recent screenshot. The room was poorly lit so the webcam ran only like 5 fps. But the processing, at the current time, was capable of 33 fps or so. I’m hoping to increase that with some further GPU optimisation. I’d also like to point out it’s 33 fps WITHOUT any ‘real’ tracking code. I’ve only used a bounding box from previous matched frames to limit the feature search. It’s pretty much running in semi-detection mode all the time. One of the main reason why I didn’t put in a tracker yet is because my room is poorly lit, which makes the webcam very susceptible to blurring. The blurring will easily kill any tracker. So it would be kinda moot to test.
I also got rid of the teapot and replaced it with a 3D model of Tux! which turned out to be a pain in the !@#$ a$$ to get up and running, but I’ll leave that rant for another post.
Updating on my previous post, I found a tiny bug when calculating the inliers. Still don’t know why the normalisation code produces less inliers though. So with the fixed code the inlier results without normalisation for Kyral castle and the house are 437 and 158 respectively, instead of 402 and 151 as reported before. The normalisation and bias random flag can be toggled in CUDA_RANSAC_Homography.h using the #define at the top. The new code can be downloaded from the same link as before:
The code was written on Linux using CodeBlocks. If you don’t have both it should not be difficult to manually create a project in the IDE of your choice. If you try to compile on Windows you will have to get rid of the Linux specific timing code.
UPDATE: 28 June 2011
It turns out the SVD code I used does not work particularly well if the image undergoes significant rotation. I have experimented with porting GNU GSL Jacobi SVD code and still no go. My GPU does not support double type, which I suspect is fairly important for SVD calculations. Oh well, guess I’ll have to wait till I get my hands on a newer Nvidia card.
This week I thought it would be interesting to see how well the popular 2D Homography using RANSAC algorithm would perform on the GPU. My first hunch was that it should work well because each RANSAC iteration is completely independent, at least for the vanilla implementation. I decided to write a simple test program that would find the homography between two images and output the matching points. The program does the following:
1. Extract FAST corners in both images (using OpenCV)
2. Use the GPU to do a simple brute force match using the Sum of Absolute Difference (SAD) of each feature patch (16×16 block).
3. Use the GPU to do the RANSAC homography calculation and compare with OpenCV’s cv::findHomography function.
4. Output matches
Things sounded simple enough until I realised I eventually needed an SVD function (for calculating the homography) that could be called in a CUDA kernel. I ended up using C code from:
with some modifications to make it work in CUDA. I also remembered my GeForce 360M GTS does not support double types in CUDA code, so things are a bit unclear in regards to the reliability and accuracy of the SVD results. At this point I just crossed my fingers and hoped for the best, like any good engineer would do.
Results and discussion
I used two image pairs to test the program. Both pairs were originally taken for panoramic stitching. The camera was rotated about the centre (no significant translation), which makes using the homography possible for this example. Below shows the final results of each pair.
Timing results for Kryal castle:
-------------------------------- Load images: 27.847 ms Convert to greyscale: 2.556 ms FAST corners: 7.743 ms Feature matching (GPU): 818.109 ms <strong>RANSAC Homography (OpenCV): 24.791 ms RANSAC Homography (GPU): 10.149 ms</strong> Refine homography: 1.011 ms -------------------------------- Total time: 892.3 ms Features extracted: 1724 1560 OpenCV Inliers: 518 GPU Inliers: 402 GPU RANSAC iterations: 2875 RANSAC parameters: Confidence: 0.99 Inliers ratio: 0.2 Data is normalised: no Bias random selection: yes Homography matrix 1.31637 -0.0165917 -304.696 0.108898 1.22028 -73.6643 0.000389038 4.51187e-05 1
And timing results for the house image:
-------------------------------- Load images: 30.393 ms Convert to greyscale: 2.481 ms FAST corners: 6.126 ms Feature matching (GPU): 529.514 ms <strong>RANSAC Homography (OpenCV): 82.173 ms RANSAC Homography (GPU): 8.498 ms</strong> Refine homography: 0.607 ms -------------------------------- Total time: 659.896 ms Features extracted: 1635 1011 OpenCV Inliers: 195 GPU Inliers: 151 GPU RANSAC iterations: 2875 RANSAC parameters: Confidence: 0.99 Inliers ratio: 0.2 Data is normalised: no Bias random selection: yes Homography matrix 1.58939 0.119068 -438.437 0.187436 1.43225 -117.053 0.000693531 0.000185465 1
The GPU code is faster than OpenCV’s cv::findHomography function, which is always a good start to the day. The speed improvement varies (2x to 10x) because OpenCV’s RANSAC code will adapt the number of iterations depending on the highest number of inliers it has found at the current time. The maximum iteration is capped at 2000. Here’s a snippet of OpenCV’s RANSAC parameters from opencv/modules/calib3d/src/fundam.cpp line 222 onwards:
const double confidence = 0.995;
const int maxIters = 2000;
There are some obvious and some less so obvious parameters I have used in my code that I need to explain. My RANSAC code uses a fixed number of iterations based on confidence and expected inlier ratio, based on the theoretical equation from the Wikipedia. I set the inlier to 0.2, which means I’m expecting 20% of the data to be good and the remaining 80% to be bad/noise/crap. I’m playing it safe here since I made no attempts to do any filtering during the feature matching step. In a real application you would, because any bad data that can be pruned out early will make RANSAC jobs easier.
The next parameter is data normalisation when calculating the homography. According to the book Multiple View Geometry, aka ‘bible’, you should never skip this stage, because doing so will unleash the gates of hell. So out of curiosity I made it an option that I can turn off and on to see the effect. For some reason I get more inliers with it turned off?! Which just ain’t right … I’ve checked the code over and over and have not been able to figure out why that is. My normalisation code is the same one described in Multiple View Geometry. Below are the results with and without normalisation for the test images:
|No. Inliers with normalisation||No. Inliers without normalisation|
It would be good to test the same code on a newer GeForce that supports double type to see if the results are different.
The last parameter, ‘Bias random selection’, was a simple and quick idea that I threw in, hoping it would improve the RANSAC point selection process. The idea is to bias the random selection to pick point pairs that have a stronger matching score. I got the idea when I was reading about the PROSAC algorithm, an improvement over RANSAC when extra information is available. I used the SAD score of each feature pair from the feature matching stage and transformed it so a good match has a high score. In the beginning I saw improvements in the number of inliers, but after adding the homography refinement code, it gave the same results with or without. Still, I like the idea of having there, it gives me a warm and fuzzy feeling at night.
From the brief playing around with the code I wrote, I can safely say that a GPU version of RANSAC for homography calculation looks like a real winner. The speed improvement is significant enough to justify using it in the real world. On my laptop, the GPU can do over 300,000 RANSAC homography iterations a second. I’m hoping to use it on my old Markerless Augmented Reality project to get real-time speed without trying too hard.
A point worth mentioning, if your data for some reason has a lot of noise and requires more than 2000 iterations to find a usable solution then OpenCV’s cv::findHomography will probably fail. I find it odd it’s not parameter that the user can change.
The project can be compiled in CodeBlocks. The project assumes CUDA is installed at /usr/local/cuda. If not, you will need to right click on each *.cu file, click Properties, Advanced and change the custom command.
The two image pair I used can be found in the samples diretory. The program’s usage is:
CUDA_RANSAC_Homography [img.jpg] [target.jpg] [results.png]
In the past I’ve received emails from people on the OpenCV Yahoo group asking for help on panoramic stitching, to which I answered them to the best of my knowledge, though the answers were theoretical. Being the practical man I like to think of myself, I decided to finally get down and dirty and write a simple panoramic stitcher that should help beginners venturing into panoramic stitching. In the process of doing so I learned some new things along the way. I believe OpenCV is scheduled to include some sort of panoramic stitching engine in future release. In the meantime you can check out my Simple Pano Stitcher program here:
Last update: 31/03/2014
It uses OpenCV to do fundamental image processing stuff and includes a public domain Levenberg-Marquardt non-linear solver.
To compile the code just type make and run bin/Release/SimplePanoStitcher. It will output a bunch of layer-xxx.png images as output. I’ve included sample images in the sample directory, which the program uses by default. Edit the source code to load your own images. You’ll need to know the focal length in pixels used to capture each image.
To keep things simple I only wrote the bare minimum to get it up and running, for more information check the comments in main.cpp at the top of the file.
Results with Simple Pano Stitcher
Below is a sample result I got after combining all the layers, by simply pasting them on top of each other. No fancy blending or anything. On close inspection there is some noticeable parallax. If I had taken it on a tripod instead of hand held it should be less noticeable.
One of the crucial steps to getting good results is the non-linear optimisation step. I decided to optimise the focal length, yaw, pitch, and roll for each image. A total of 4 parameters. I thought it would be interesting to compare results with and without any optimisation. For comparison I’ll be using the full size images (3872 x 2592), which is not included in the sample directory to keep the package size reasonable. Below shows a close up section of the roof of the house without optimisation.
The root mean squared error reported is 22.5008 pixels. The results are without a doubt awful! Looking at it too long would cause me to require new glasses with a stronger optical prescription. I’ll now turn on the non-linear optimisation step for the 4 parameters mentioned previously. Below is what we get.
The reported root mean squared error is 3.84413 pixels. Much better! Not perfect, but much more pleasant to look at than the previous image.
There is no doubt a vast improvement by introducing non-linear optimisation to panoramic stitching. One can improve the results further by taking into consideration for radial distortion, but I’ll leave that as an exercise to the reader.
I originally got the focal length for my camera lens from calibration using the sample program (calibration.cpp) in the OpenCV 2.x directory. It performs a standard calibration using a checkerboard pattern. Yet even then its accuracy is still a bit off. This got me thinking a bit. Can we use panoramic stitching to refine the focal length? What are the pros and cons? The images used in the stitching process should ideally be scenery with features that are very very far, so that parallax is not an issue, such as outdoor landscape of mountains or something. Seems obvious enough, there are probably papers out there but I’m too lazy to check at this moment …
Please see this page for the most up to date information regarding this post and code.
A few months ago I wrote an OpenCV version of the pose estimation algorithm found in the paper Robust Pose Estimation from a Planar Target (2006) by Gerald Schweighofer and Axel Pinz using their Matlab code from the link in the paper.
I originally ported it so I could play around with Augmented Reality, specifically to track a planar maker (business card). Although that project hasn’t received much attention lately I figured someone might find the pose estimation part useful. I was about to post it up last night when I came across a small bug, which took 2-3 hour to hunt down. Seems like the new OpenCV Mat class has some subtle gotchas but all good now.I checked my results with the original Matlab code and they appear numerically the same, which is always a good sign.
Look at demo.cpp to see how it works.
UPDATE: 12 March 2012
If you plan to use this for Augmented Reality applications then you should modify Line 23 in RPP.cpp from:
ObjPose(model_3D, iprts, NULL, &Rlu, &tlu, &it1, &obj_err1, &img_err1);
ObjPose(model_3D, iprts, &Rlu, &Rlu, &tlu, &it1, &obj_err1, &img_err1);
This initialise the guess of the rotation matrix to the last estimate, providing smoother and more accurate results.