This my attempt at using the GPU to calculate the homography between an image using RANSAC. Each RANSAC iteration is done in parallel. The random number generation used by RANSAC was done the CPU and uploaded the GPU. You might also find the following useful in this code:

- Example of using OpenCV’s GPU SURF code for detecting and matching
- SVD implemented as a CUDA kernel function, with parameters to specify the matrix size

In regards to the SVD function, I ported a version of GNU Scientific Library Jacobi SVD.

# Download

**Last update:** 7th July 2012

CUDA_RANSAC_Homography_1.1.1.tar.bz2

This code requires OpenCV 2.x installed and CUDA development libraries. The code is targeted at Linux but should be easy to compile in Windows.

# Compiling

Use CodeBlocks to open the project and compile. CUDA is assumed to be installed at /usr/local/cuda.

# Usage

CUDA_RANSAC_Homography accepts 3 parameters:

**./bin/Release/CUDA_RANSAC_Homography [img.jpg] [target.jpg] [results.png]**

The first two are the input images and the third is the output of the matched features. Some sample images are included in the **sample** directory.** **

There are also two #define options in CUDA_RANSAC_Homography.h

**#define NORMALISE_INPUT_POINTS**

**#define BIAS_RANDOM_SELECTION**

**NORMALISE_INPUT_POINTS **normalises the point before calculating the Homography. In theory, it should provided better stability. Try with and without and see what you get.

**BIAS_RANDOM_SELECTION **uses my variant of RANSAC that uses a bias random number generation, than than uniform like the vanilla implementation. The bias random number is based on the score of the matched feature pairs. Features with better scoring matches will tend to get picked more often than those with lower scores. This option is useful when the number of iterations is low for the given inliers/outlier ratio.

# License

I’m releasing my part under the Simplified BSD license, but the CUDA SVD kernel code is GPL (whatever version GSL is under).

Hi, Thank you for sharing your code.

I have a question about that. Can I use the RANSAC part of the “GPU RANSAC Homography” for simple Line fitting application?

Thanks.

Hi,

Can you clarify a bit? Did you mean can my GPU code be used to do line fitting?

yes, and can I use it in matlab?

You’ll have to modify the code to do line fitting. The code as is won’t run in Matlab.

This code proved to be a great starting point for me, thanks for posting! I just wish I could find a suitable one-sided Jacobi SVD implementation with a permissive license.

I was wondering, did you ever find the problem with the point normalization (or SVD, or denormalization)? I noticed the current code replaces the average distance with the std deviation (for performance?), but even so I would expect to see a net positive effect, particularly in image pairs with such a large disparity.

Glad you found it useful ðŸ™‚

I never really did investigate further on the normalization step. I can’t remember why I commented out the average distance code. It could very well be for performance reason, since there are slightly less sqrts called. If you ever suceed in using normalisation let me know! I’d be keen to know what’s going on.

I found the problem – starting on line 201 in CUDA_RANSAC_Homography.cu, when filling in the last row, src and dst are not normalized, as they are with every other row. To fix, change from:

// Fill the last row

float srcx = src[3].x;

float srcy = src[3].y;

float dstx = dst[3].x;

float dsty = dst[3].y;

to:

float srcx = src_scale * ( src[3].x – src_mean.x );

float srcy = src_scale * ( src[3].y – src_mean.y );

float dstx = dst_scale * ( dst[3].x – dst_mean.x );

float dsty = dst_scale * ( dst[3].y – dst_mean.y );

I haven’t tested with wide-baseline pairs, so let me know if it gives you a significant improvement – with my data there’s not much of a difference with normalization.

A few notes:

My GPU supports doubles, and after modifying both the original (QR-based) SVD code and yours to use either floats or doubles, I found that doubles give very similar results but are a bit slower to use compared to floats – the inlier ratio is much more important.

Also, regardless of whether your image pairs contain significant rotation, I found the Jacobi iteration gives better results. This is because the orthogonalization has a significant effect on the smallest eigenvalues/vectors, which doesn’t matter in many applications but that’s precisely what’s important here.

Finally, when I was going through the code I looked at other implementations for reference, and I found some interesting things. For instance, in OpenCV they actually calculate the eigenvalues of AtA explicitly (!!) which is something I’ve always heard is a no-no for stability reasons – but they also do simple anisotropic normalization which doesn’t involve any squaring or square roots (just absolute values). In one Matlab implementation, they use the redundant 3rd equation for all matched pairs, not just the last one.

I haven’t had a chance to test all these variations yet, but I hope this helps someone else!

Thanks for the update. I’ll upate my code with that bug you found.

I haven’t had a chance to look at different SVD implementation. GSL’s SVD was the first one I found and implemented it.

That OpenCV find is interesting, I shoud check it out when I have time. I too have read that AtA is a bad thing (magnifying the difference in magnitude of values). Bad in theory but ok in practice?

I never really got a satisfactory answer on the normalisation method used in Multiview Geometry. I might have missed something. They said to do that, but don’t mention if other methods are equally valid.

The impression I got from MVG was that the anisotropic scaling tested was one using the elliptical axes of the points (which would explain the “extra effort” necessary) vs OpenCV, which just scales along the X and Y axis, and actually takes less effort. For only 4 points, I’d bet all the methods considered give very similar results, so it makes sense to use the cheapest one.

Can you please share makefile ? or how to compile instructions ?

There’s no Makefile. The instructions are on the website.

Hi, nghiaho12!

The excellent implementation of easy acceleration singular value decomposition from CPU based GSL SVD (BLAS) on CUDA.

In this code we find Homography Matrix (3×3) for optimal perspective transformation between 2D src & 2D dst points.

Also you write good article how to find the optimal/best rotation and translation between 3D src & 3D dst points: http://nghiaho.com/?page_id=671

But how can I find Homography Matrix (3×3) for optimal perspective transformation between 3D src & 2D dst points. Where 2D points are the projection of 3D points on some plane.

Is this enough to change code only in this part – add Z, or do I must to do something else?

for(int i=0; i < 4; i++) {

float srcx = (src[i].x – src_mean.x)*src_scale;

float srcy = (src[i].y – src_mean.y)*src_scale;

float srcz = (src[i].z – src_mean.z)*src_scale; // added

float dstx = (dst[i].x – dst_mean.x)*dst_scale;

float dsty = (dst[i].y – dst_mean.y)*dst_scale;

int y1 = (i*2 + 0)*9;

int y2 = (i*2 + 1)*9;

// First row

X.data[y1 + 0] = 0.0f;

X.data[y1 + 1] = 0.0f;

X.data[y1 + 2] = 0.0f;

X.data[y1 + 3] = -srcx;

X.data[y1 + 4] = -srcy;

X.data[y1 + 5] = -srcz; //-1.0f;

X.data[y1 + 6] = dsty*srcx;

X.data[y1 + 7] = dsty*srcy;

X.data[y1 + 8] = dsty*srcz; // dsty;

// Second row

X.data[y2 + 0] = srcx;

X.data[y2 + 1] = srcy;

X.data[y2 + 2] = srcz; // 1.0f;

X.data[y2 + 3] = 0.0f;

X.data[y2 + 4] = 0.0f;

X.data[y2 + 5] = 0.0f;

X.data[y2 + 6] = -dstx*srcx;

X.data[y2 + 7] = -dstx*srcy;

X.data[y2 + 8] = -dstx*srcz; //-dstx;

}

// Fill the last row

float srcx = (src[3].x – src_mean.x)*src_scale;

float srcy = (src[3].y – src_mean.y)*src_scale;

float srcz = (src[3].z – src_mean.z)*src_scale; // added

float dstx = (dst[3].x – dst_mean.x)*dst_scale;

float dsty = (dst[3].y – dst_mean.y)*dst_scale;

X.data[8*9 + 0] = -dsty*srcx;

X.data[8*9 + 1] = -dsty*srcy;

X.data[8*9 + 2] = -dsty*srcz; //-dsty;

X.data[8*9 + 3] = dstx*srcx;

X.data[8*9 + 4] = dstx*srcy;

X.data[8*9 + 5] = dstx*srcz; //dstx;

X.data[8*9 + 6] = 0.0f;

X.data[8*9 + 7] = 0.0f;

X.data[8*9 + 8] = 0.0f;

bool ret = linalg_SV_decomp_jacobi(&X, &V, &S);

You can’t do this using 3×3. You need the 3×4 projection matrix P, which is made up of P = K*[R|t]. K is the camera matrix, R rotation, t translation. Solving this is quite complex from memory. I never done it myself.