OpenCV port of ‘Robust Pose Estimation from a Planar Target’

Last Updated on December 4, 2013 by nghiaho12

Please see this page for the most up to date information regarding this post and code.

A few months ago I wrote an OpenCV version of the pose estimation algorithm found in the paper Robust Pose Estimation from a Planar Target (2006) by Gerald Schweighofer and Axel Pinz using their Matlab code from the link in the paper.

I originally ported it so I could play around with Augmented Reality, specifically to track a planar maker (business card). Although that project hasn’t received much attention lately I figured someone might find the pose estimation part useful. I  was about to post it up last night when I came across a small bug, which took 2-3 hour to hunt down. Seems like the new OpenCV Mat class has some subtle gotchas but all good now.I checked my results with the original Matlab code and they appear numerically the same, which is always a good sign.

Download RobustPlanarPose-1.1.2.zip

Look at demo.cpp to see how it works.

UPDATE: 12 March 2012

If you plan to use this for Augmented Reality applications then you should modify Line 23 in RPP.cpp from:

ObjPose(model_3D, iprts, NULL, &Rlu, &tlu, &it1, &obj_err1, &img_err1);

to

ObjPose(model_3D, iprts, &Rlu, &Rlu, &tlu, &it1, &obj_err1, &img_err1);

This initialise the guess of the rotation matrix to the last estimate, providing smoother and more accurate results.

KD-tree on GPU for 3D point searching

Last Updated on March 19, 2011 by nghiaho12

This post is a follow up on my previous post about using the GPU for brute force nearest neighbour searching. In the previous post I found the GPU to be slower than using the ANN library, which utilises some sort of KD-tree to speed up searches. So I decided to get around to implementing a basic KD-tree on the GPU for a better comparison.

My approach was to build the tree on the host and transfer it to the GPU.  Sounds conceptually easy enough, as most things do. The tricky part was transferring the tree structure. I ended up representing the tree as an array node on the GPU, with each node referencing other nodes using an integer index, rather than using pointers, which I suspect would have got rather ugly. The KD-tree that I coded allows the user to set the maximum depth if they want to, which turned out to have some interesting effects as shown shortly.  But first lets see some numbers:

nghia@nghia-laptop:~/Projects/CUDA_KDtree$ bin/Release/CUDA_KDtree
CUDA blocks/threads: 196 512
Points in the tree: 100000
Query points: 100000

Results are the same!

GPU max tree depth: 99
GPU create + search: 321.849 + 177.968 = 499.817 ms
ANN create + search: 162.14 + 456.032 = 618.172 ms
Speed up of GPU over CPU for searches: 2.56x

Alright, now we’re seeing better results from the GPU. As you can see I broken the timing down for the create/search part of the KD-tree. My KD-tree creation code is slower than ANN, no real surprise there. However the searches using the GPU is faster, the results I was hoping for. But only 2.56x faster, nothing really to rave about. I could probably get the same speed up, or more, by multi-threading the searches on the CPU. I then started to play around with the maximum depth of my tree and got these results:

GPU max tree depth: 13
GPU create + search: 201.771 + 56.178 = 257.949 ms
ANN create + search: 163.414 + 403.978 = 567.392 ms
Speed up of GPU over CPU for searches: 7.19x

From 2.56x to 7.19x, a bit better! A max tree depth of 13 produced the fastest results. It would seem that there is a sweet spot between creating a tree with the maximum depth required, so that each leaf of the tree holds only a single point, versus a tree with a limited depth and doing a linear search on the leaves. I suspect most of the bottleneck is due to the heavy use of global memory, especially randomly accessing them from different threads, which is why we’re not seeing spectacular speed ups.

The code for the CUDA KD-tree can be downloaded here CUDA_KDtree.zip. Its written for Linux and requires the ANN library to be installed.

Brute Force Nearest Neighbour in CUDA for 3D points

Last Updated on March 10, 2011 by nghiaho12

It has been a while since I touched CUDA and seeing as I haven’t posted anything interesting on the front page yet I decided to write a simple brute force nearest neighbour to pass time. The idea has been implemented by the fast kNN library. The library is restricted to non-commerical purposes.

In the past I’ve needed to do nearest neighbour search mainly in 3D space, particular for the Iterative Closest Point (ICP) algorithm, or the likes of it. I relied on the ANN library to do fast nearest neighbour searches, which it does very well. A brute force implementation at the time would have been unimaginably slow and simply impractical. It would have been as slow as watching paint dry for the kind of 3D data I worked on. On second thoughts, watching paint dry would have been faster. Brute force has a typical running time of O(NM), where N and M are the size of the two data set to be compared, basically quadratic running time if both data sets are similar in size. This is generally very slow and impractical for very large data set using the CPU. You would almost be crazy to do so. But on the GPU it is a different story.

I ran a test case where I have 100,000 3D points and 100,000 query points. Each query point searches for its nearest point. This equates to 100,000,000,000 comparisons (that’s a lot of zeros, did I count them right? yep I did). This is what I got on my Intel i7 laptop with a GeForce GTS 360M:

nghia@nghia-laptop:~/Projects/CUDA_NN$ bin/Release/CUDA_NN
Data dimension: 3
Total data points: 100000
Total query points: 100000

GPU: 2193.72 ms
CPU brute force: 69668.3 ms
libANN: 540.008 ms

Speed up over CPU brute force from using GPU: 31.76x
Speed up over libANN using GPU: 0.25x

ANN  produces the fastest results for 3D point searching. The GPU brute force version is slower but still practical for real applications, especially given its ease of implementation and reasonable running time.

I’d imagine even greater speed up (orders of magnitude?) is achievable with a KD-tree created on the GPU. A paper describing building a KD-tree on the GPU can be found at http://www.kunzhou.net/.  However I could not find any source code on the website. I might give it a shot when I have time.

The demo code I wrote only searches for the nearest point given a query point. Extending the program to find the K nearest neighbour for a bunch of query points should not be too difficult. Each query point would have to maintain a sorted array of indexes and distances.

My code can be downloaded here (CUDA_LK.zip). If you’re on Linux just type make and run CUDA_NN. Just a warning, on Linux if a CUDA program runs for more than 5 second or so the operating system will kill it and display some warning.