RANSAC Homography on the GPU using CUDA

UPDATE: 28 June 2011

It turns out the SVD code I used does not work particularly well if the image undergoes significant rotation. I have experimented with porting GNU GSL Jacobi SVD code and still no go. My GPU does not support double type, which I suspect is fairly important for SVD calculations. Oh well, guess I’ll have to wait till I get my hands on a newer Nvidia card.

This week I thought it would be interesting to see how well the popular 2D Homography using RANSAC algorithm would perform on the GPU. My first hunch was that it should work well because each RANSAC iteration is completely independent, at least for the vanilla implementation. I decided to write a simple test program that would find the homography between two images and output the matching points. The program does the following:

1. Extract FAST corners in both images (using OpenCV)
2. Use the GPU to do a simple brute force match using the Sum of Absolute Difference (SAD) of each feature patch (16×16 block).
3. Use the GPU to do the RANSAC homography calculation and compare with OpenCV’s cv::findHomography function.
4. Output matches

Things sounded simple enough until I realised I eventually needed an SVD function (for calculating the homography) that could be called in a CUDA kernel. I ended up using C code from:


with some modifications to make it work in CUDA. I also remembered my GeForce 360M GTS does not support double types in CUDA code, so things are a bit unclear in regards to the reliability and accuracy of the SVD results. At this point I just crossed my fingers and hoped for the best, like any good engineer would do.

Results and discussion

I used two image pairs to test the program. Both pairs were originally taken for panoramic stitching. The camera was rotated about the centre (no significant translation), which makes using the homography possible for this example.  Below shows the final results of each pair.

Kryal castle



Timing results for Kryal castle:

Load images: 27.847 ms
Convert to greyscale: 2.556 ms
FAST corners: 7.743 ms
Feature matching (GPU): 818.109 ms
<strong>RANSAC Homography (OpenCV): 24.791 ms
RANSAC Homography (GPU): 10.149 ms</strong>
Refine homography: 1.011 ms
Total time: 892.3 ms

Features extracted: 1724 1560
OpenCV Inliers: 518
GPU Inliers: 402
GPU RANSAC iterations: 2875

RANSAC parameters:
Confidence: 0.99
Inliers ratio: 0.2
Data is normalised: no
Bias random selection: yes

Homography matrix
1.31637 -0.0165917 -304.696
0.108898 1.22028 -73.6643
0.000389038 4.51187e-05 1

And timing results for the house image:

Load images: 30.393 ms
Convert to greyscale: 2.481 ms
FAST corners: 6.126 ms
Feature matching (GPU): 529.514 ms
<strong>RANSAC Homography (OpenCV): 82.173 ms
RANSAC Homography (GPU): 8.498 ms</strong>
Refine homography: 0.607 ms
Total time: 659.896 ms

Features extracted: 1635 1011
OpenCV Inliers: 195
GPU Inliers: 151
GPU RANSAC iterations: 2875

RANSAC parameters:
Confidence: 0.99
Inliers ratio: 0.2
Data is normalised: no
Bias random selection: yes

Homography matrix
1.58939 0.119068 -438.437
0.187436 1.43225 -117.053
0.000693531 0.000185465 1

The GPU code is faster than OpenCV’s cv::findHomography function, which is always a good start to the day. The speed improvement varies (2x to 10x) because OpenCV’s RANSAC code will adapt the number of iterations depending on the highest number of inliers it has found at the current time. The maximum iteration is capped at 2000. Here’s a snippet of OpenCV’s RANSAC parameters from opencv/modules/calib3d/src/fundam.cpp line 222 onwards:

const double confidence = 0.995;
const int maxIters = 2000;

There are some obvious and some less so obvious parameters I have used in my code that I need to explain. My RANSAC code uses a fixed number of iterations based on confidence and expected inlier ratio, based on the theoretical equation from the Wikipedia. I set the inlier to 0.2, which means I’m expecting 20% of the data to be good and the remaining 80% to be bad/noise/crap. I’m playing it safe here since I made no attempts to do any filtering during the feature matching step. In a real application you would, because any bad data that can be pruned out early will make RANSAC jobs easier.

The next parameter is data normalisation when calculating the homography. According to the book Multiple View Geometry, aka ‘bible’, you should never skip this stage, because doing so will unleash the gates of hell. So out of curiosity I made it an option that I can turn off and on to see the effect. For some reason I get more inliers with it turned off?! Which just ain’t right … I’ve checked the code over and over and have not been able to figure out why that is. My normalisation code is the same one described in Multiple View Geometry. Below are the results with and without normalisation for the test images:

No. Inliers with normalisation No. Inliers without normalisation
Kryal castle 320 402
House 44 151

It would be good to test the same code on a newer GeForce that supports double type to see if the results are different.

The last parameter, ‘Bias random selection’, was a simple and quick idea that I threw in, hoping it would improve the RANSAC point selection process. The idea is to bias the random selection to pick point pairs that have a stronger matching score. I got the idea when I was reading about the PROSAC algorithm, an improvement over RANSAC when extra information is available. I used the SAD score of each feature pair from the feature matching stage and transformed it so a good match has a high score. In the beginning I saw improvements in the number of inliers, but after adding the homography refinement code, it gave the same results with or without. Still, I like the idea of having there, it gives me a warm and fuzzy feeling at night.


From the brief playing around with the code I wrote, I can safely say that a GPU version of RANSAC for homography calculation looks like a real winner. The speed improvement is significant enough to justify using it in the real world. On my laptop, the GPU can do over 300,000 RANSAC homography iterations a second. I’m hoping to use it on my old Markerless Augmented Reality project to get real-time speed without trying too hard.

A point worth mentioning, if your data for some reason has a lot of noise and requires more than 2000 iterations to find a usable solution then OpenCV’s cv::findHomography will probably fail. I find it odd it’s not parameter that the user can change.



The project can be compiled in CodeBlocks. The project assumes CUDA is installed at /usr/local/cuda. If not, you will need to right click on each *.cu file, click Properties, Advanced and change the custom command.

The two image pair I used can be found in the samples diretory. The program’s usage is:

CUDA_RANSAC_Homography [img.jpg] [target.jpg] [results.png]

20 thoughts on “RANSAC Homography on the GPU using CUDA”

  1. hello, i have read the code in your paper.I want to know the meaning of parameter match_score in cuda function CUDA_BruteForceMatching() .I have thought for a long time ,but still have no idea.I am looking forward to your reply.
    Thank you!

    1. Hi,

      It’s just a score on how well a given feature pairs is similar, or how well they match. Higher score = very similar/good match. It was a simple idea I experimented with. Rather than make RANSAC pick completely random points, it will bias towards the better points with higher score.

  2. Thanks for your reply!
    I want to kown whether the cuda function can be related to Sift feature or not.And now i have got the match pointer pairs ,and how can i use your cuda function CUDA_RANSAC_Homography(),especialy the paramter 3 in the function.Is it relative to the sift descriptor?

  3. What relationship does the parameter 3 has in cuda function CUDA_RANSAC_Homography between source matcher pointer and dst matcher pointer?

    1. match_score is the same length as src/dst points. If you want to take advantage of it for SIFT you’ll need to know the distance/score between each SIFT matched pair. The original SIFT used Euclidean distance, I think. Assuming the SIFT feature vectors are normalised to unit length, then the best match is when the distance is 0 and the worst sqrt(2). So you would do something like

      for(int i=0; i < src.size(); i++) { match_score[i] = sqrt(2) - SIFT_pair_distance[i]; // invert the value } Or if you don't want to use match_score, comment out #define BIAS_RANDOM_SELECTION in CUDA_RANSAC_Homography.h. You'll also need to comment out: assert(match_score.size() == dst.size()); on line 324 of CUDA_RANSAC_Homography.cu, that should have been wrapped in an #ifdef.

  4. Thanks!
    I have made it woking. The paramter match_score i really do not want to use,so I delete it,and the result matrix has got.

  5. The result 3*3matrix is wrong,and i got the filnal result is not the same as opencv.Can you give me two picture,and their result matrix?

  6. I got the matched pointer pairs,and pass them to opencv Homography,and it can get rigth matrix.But i pass the same pointer pairs ,the returned value get wrong.Does the paramer poiner pairs in your function has differencees with opencv?

    1. The code is not reliable, as mentioned at the top of the page 🙂 I have not spent more time looking at it further yet. You should not use it for anything important. I need to find an implementation of SVD that will work for 32bit floating point values only.

  7. Hello, I’ve been trying to try out the code on Windows but it seems that Windows is just allergic to std::vectors. The KeyPoint vectors where the features found through FAST are written just won’t work on Windows. Do I need to initialize them before passing them onto FAST because otherwise it says “exception writing to location….”?

    1. I think you might have stumbled onto a more serious problem. On Windows, there are issues with using STL vector between DLLs. I’ve had a situation where the Debug version of the DLL work, in debug mode, but not release. I haven’t been able to find anything on the OpenCV developer site though.

  8. Hi,
    I have a stupid question!
    It was yesterday that I just heard about “RANSAC”. I started to read more about it and I can understand what’s that but I don’t know how I can use it in my project?!? I want to estimate the focal length of the camera based on the height of objects in a video, but I don’t know what’s the input to “RANSAC” algorithm???
    Any help will be appreciated…

    1. Hi,

      I’m not sure if RANSAC is the answer you’re looking for. Your problem seems more like a geometry problem, rather than a model fitting (in the presence of outlier) problem.

Leave a Reply

Your email address will not be published. Required fields are marked *