UPDATE: 28 June 2011
It turns out the SVD code I used does not work particularly well if the image undergoes significant rotation. I have experimented with porting GNU GSL Jacobi SVD code and still no go. My GPU does not support double type, which I suspect is fairly important for SVD calculations. Oh well, guess I’ll have to wait till I get my hands on a newer Nvidia card.
This week I thought it would be interesting to see how well the popular 2D Homography using RANSAC algorithm would perform on the GPU. My first hunch was that it should work well because each RANSAC iteration is completely independent, at least for the vanilla implementation. I decided to write a simple test program that would find the homography between two images and output the matching points. The program does the following:
1. Extract FAST corners in both images (using OpenCV)
2. Use the GPU to do a simple brute force match using the Sum of Absolute Difference (SAD) of each feature patch (16×16 block).
3. Use the GPU to do the RANSAC homography calculation and compare with OpenCV’s cv::findHomography function.
4. Output matches
Things sounded simple enough until I realised I eventually needed an SVD function (for calculating the homography) that could be called in a CUDA kernel. I ended up using C code from:
with some modifications to make it work in CUDA. I also remembered my GeForce 360M GTS does not support double types in CUDA code, so things are a bit unclear in regards to the reliability and accuracy of the SVD results. At this point I just crossed my fingers and hoped for the best, like any good engineer would do.
Results and discussion
I used two image pairs to test the program. Both pairs were originally taken for panoramic stitching. The camera was rotated about the centre (no significant translation), which makes using the homography possible for this example. Below shows the final results of each pair.
Timing results for Kryal castle:
-------------------------------- Load images: 27.847 ms Convert to greyscale: 2.556 ms FAST corners: 7.743 ms Feature matching (GPU): 818.109 ms <strong>RANSAC Homography (OpenCV): 24.791 ms RANSAC Homography (GPU): 10.149 ms</strong> Refine homography: 1.011 ms -------------------------------- Total time: 892.3 ms Features extracted: 1724 1560 OpenCV Inliers: 518 GPU Inliers: 402 GPU RANSAC iterations: 2875 RANSAC parameters: Confidence: 0.99 Inliers ratio: 0.2 Data is normalised: no Bias random selection: yes Homography matrix 1.31637 -0.0165917 -304.696 0.108898 1.22028 -73.6643 0.000389038 4.51187e-05 1
And timing results for the house image:
-------------------------------- Load images: 30.393 ms Convert to greyscale: 2.481 ms FAST corners: 6.126 ms Feature matching (GPU): 529.514 ms <strong>RANSAC Homography (OpenCV): 82.173 ms RANSAC Homography (GPU): 8.498 ms</strong> Refine homography: 0.607 ms -------------------------------- Total time: 659.896 ms Features extracted: 1635 1011 OpenCV Inliers: 195 GPU Inliers: 151 GPU RANSAC iterations: 2875 RANSAC parameters: Confidence: 0.99 Inliers ratio: 0.2 Data is normalised: no Bias random selection: yes Homography matrix 1.58939 0.119068 -438.437 0.187436 1.43225 -117.053 0.000693531 0.000185465 1
The GPU code is faster than OpenCV’s cv::findHomography function, which is always a good start to the day. The speed improvement varies (2x to 10x) because OpenCV’s RANSAC code will adapt the number of iterations depending on the highest number of inliers it has found at the current time. The maximum iteration is capped at 2000. Here’s a snippet of OpenCV’s RANSAC parameters from opencv/modules/calib3d/src/fundam.cpp line 222 onwards:
const double confidence = 0.995;
const int maxIters = 2000;
There are some obvious and some less so obvious parameters I have used in my code that I need to explain. My RANSAC code uses a fixed number of iterations based on confidence and expected inlier ratio, based on the theoretical equation from the Wikipedia. I set the inlier to 0.2, which means I’m expecting 20% of the data to be good and the remaining 80% to be bad/noise/crap. I’m playing it safe here since I made no attempts to do any filtering during the feature matching step. In a real application you would, because any bad data that can be pruned out early will make RANSAC jobs easier.
The next parameter is data normalisation when calculating the homography. According to the book Multiple View Geometry, aka ‘bible’, you should never skip this stage, because doing so will unleash the gates of hell. So out of curiosity I made it an option that I can turn off and on to see the effect. For some reason I get more inliers with it turned off?! Which just ain’t right … I’ve checked the code over and over and have not been able to figure out why that is. My normalisation code is the same one described in Multiple View Geometry. Below are the results with and without normalisation for the test images:
|No. Inliers with normalisation||No. Inliers without normalisation|
It would be good to test the same code on a newer GeForce that supports double type to see if the results are different.
The last parameter, ‘Bias random selection’, was a simple and quick idea that I threw in, hoping it would improve the RANSAC point selection process. The idea is to bias the random selection to pick point pairs that have a stronger matching score. I got the idea when I was reading about the PROSAC algorithm, an improvement over RANSAC when extra information is available. I used the SAD score of each feature pair from the feature matching stage and transformed it so a good match has a high score. In the beginning I saw improvements in the number of inliers, but after adding the homography refinement code, it gave the same results with or without. Still, I like the idea of having there, it gives me a warm and fuzzy feeling at night.
From the brief playing around with the code I wrote, I can safely say that a GPU version of RANSAC for homography calculation looks like a real winner. The speed improvement is significant enough to justify using it in the real world. On my laptop, the GPU can do over 300,000 RANSAC homography iterations a second. I’m hoping to use it on my old Markerless Augmented Reality project to get real-time speed without trying too hard.
A point worth mentioning, if your data for some reason has a lot of noise and requires more than 2000 iterations to find a usable solution then OpenCV’s cv::findHomography will probably fail. I find it odd it’s not parameter that the user can change.
The project can be compiled in CodeBlocks. The project assumes CUDA is installed at /usr/local/cuda. If not, you will need to right click on each *.cu file, click Properties, Advanced and change the custom command.
The two image pair I used can be found in the samples diretory. The program’s usage is:
CUDA_RANSAC_Homography [img.jpg] [target.jpg] [results.png]