Optical flow on CUDA

This is my own implementation of the Lucas Kanade optical flow algorithm using CUDA based on the paper:

Pyramidal Implementation of the Lucas Kanade Feature Tracker Description of the algorithm by Jean-Yves Bouguet.

I have always wanted to learn how to program using CUDA so I decided to start by implementing the Lucas Kanade optical flow algorithm. My version does optical flow calculations for every pixel (dense optical flow), as oppose to sparse. This makes coding much easier (no need to write a feature detector) and having a dense field is always nice.

I get about a 30x speed up over OpenCV’s version. However the results are slightly different. OpenCV tends to produce slightly more accurate results, which I have yet to replicate.

Download

Requirements

  • OpenCV
  • Nvidia graphics card with CUDA installed
  • GCC

Downoad cudaLK.zip

On Linux you can compile the code by running the compile script. Most likely you’ll have to edit the script to match your system. By default it’s linking to the 64bit CUDA libraries. The program is run from the command line as follows:

./cudaLK img1.png img2.png

This will produce a cudaLK.png and opencv.png file for comparison, with optical flow drawn every 16 pixels. The input images can be any one of the popular image format supported by the OpenCV library (eg. jpeg, bmp, png).

IMPORTANT NOTE

This code is by no means production quality, in fact it isn’t at all. It was written just so I could get familiar with programming in CUDA. This code won’t produce high quality result as OpenCV, but nonetheless should serve as a rough guide for comparison.

Results on my laptop

The results were obtained on my Asus laptop with the following specs:

  • Intel i7 Q720 @ 1.60GHz
  • Nvdidia Geforce GTS360M 1GB VRAM
  • 6GB RAM
  • Ubuntu 10.04 LTS 64bit version
  • CUDA 3.0

The following parameters were used for the optical flow calculation:

Image size 1280×640
Patch size 13×13
Pyramid level 3
Maximum iterations 10
Termination condition delta < 0.01 pixels
Dense optical flow (all pixels) Yes

I used the following two images extracted from a Ladybug camera sample video from Point Grey (hope they don’t mind). You can download the originals by clicking on the thumbnails.

Below shows a time break down of the stages involved. I chose to use the gettimeofday() function to time the different stages as seen from the CPU but included GPU time results from CUDA profiler for a more accurate breakdown. The CUDA profiler does take into account of CPU time but only for function calls, not section of code.

Operation CPU (ms) GPU (ms)
Copying 2 images from CPU to GPU 1
Converting RGB to greyscale 1 0.947
Generating the pyramids 1 0.665
Optical flow 907 904.682
Copying results from GPU to CPU 7
Total time for cudaLK 918
OpenCV’s optical flow (8 threads) 28194

That works out to be about 892,000 optical flow pixels per second using CUDA. Pretty good ! In comparison with OpenCV’s highly optimised CPU implementation utilising all 4 cores (8 threads), the GPU version is about 30x faster.

And of course the actual results. Obviously there is some room for improvement …

Results from CudaLK

Results from OpenCV

Some thoughts

As shown, the performance of the GPU is much faster than the equivalent CPU implementation, even when all cores are utilised. I made use of CUDA’s texture memory, which is not only faster than global memory, because of caching, but has hardware bilinear interpolation support. One thing I did not implement is explicit boundary checking when the patch is partially off the image. I relied on the texture memory returning a clamped value for pixels off the texture and hoped they didn’t affect the overall tracking significantly.

With some extra work to get the optical flow quality up to OpenCV’s level, I would still expect the GPU version to run at least 15-20x faster. Maybe in the future when I get around to it.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>