Category Archives: Technical

A bunch of technical posts to help me remember useful stuff, should I become senile in the future.

SFM with OpenCV + GTSAM + PMVS

With some more free time lately I’ve decided to get back into some structure from motion (SFM). In this post I show a simple SFM pipeline using a mix of OpenCV, GTSAM and PMVS to create accurate and dense 3D point clouds.

This code only serves as an example. There is minimal error checking and will probably bomb out on large datasets!


You will need to edit src/main.cpp to point to your dataset or you test with my desk dataset.


There are four main steps in the SFM pipeline code that I will briefly cover

  1. Feature matching
  2. Motion recovery and triangulation
  3. Bundle adjustment
  4. Output for PMVS

If your code editor supports code folding you can turn it on and see each step locally scoped around curly braces in src/main.cpp.

To simplify the code I assume the images are taken sequentially, that is each subsequent image is physically close and from the same camera with fixed zoom.

Feature matching

The first step is to find matching features between the images. There are many OpenCV tutorial on feature matching out there so I won’t go into too much detail.

There are a few ways to go about picking pair of images to match. In my code I match every image to each other. So if there N images, there are N*(N-1)/2 image pairs to match. This of course can get pretty slow since it scales quadratically with the number of images.

I initially tried ORB but found AKAZE to produce more matches. Matches have to past a ratio test (best match vs second best match) and the fundamental matrix constraint to be kept.

The only slightly tedious part is the book keeping of matched features. You need to keep track of features that match across many images. For example, a feature in image 0 may have a match in image 1 and image2. This is important because features that are seen in at least 3 views allow us to recover relative scale between motion.

Motion recovery and triangulation

This is the core part of the SFM pipeline and I take advantage of the following OpenCV functions

  • cv::findEssentialMat
  • cv::recoverPose
  • cv::triangulatePoints

cv::findEssentialMat uses the 5-point algorithm to calculate the essential matrix from the matched features. It’s the de-facto algorithm for finding the essential matrix and is superior to the 8-point algorithm when used in conjunction with RANSAC to filter out outliers. But it is not easy to code up from scratch compared to the 8-point.

The pose from the essential matrix is recovered by cv::recoverPose. It returns a 3×3 rotation (R) and translation vector (t). The pose at camera n is calculated as follows

T_n = T_{n-1}\left[\begin{array}{ccc|c} & & \\ & R & & t\\ & & \\ 0 & 0 & 0 & 1 \end{array}\right]

T_n is a 4×4 matrix.

Given two poses the 2D image points can then be triangulated using cv::triangulatePoints. This function expects two projection matrices (3×4) as input. To make a projection matrix from the pose you also need the camera matrix K (3×3).  Let’s say we break up the pose matrix T_n into R_n and t_n components as shown below.

T_n = \left[\begin{array}{ccc|c} & & \\ & R_n & & t_n\\ & & \\ 0 & 0 & 0 & 1 \end{array}\right]

The 3×4 projection matrix for the camera n is

P_n = K\left[\begin{array}{c|c} R_n^T & -R_n^{T}t_n\end{array}\right]

This process is performed sequentially eg. image 0 to image 1, image 1 to image 2.

There is an ambiguity in the scale recovered from the essential matrix (hence only 5 degrees of freedom). We don’t know the length of the translation between each image, just the direction. But that’s okay, we can use the first image pair (image 0 and image 1) and use that as the reference for relative scale. An illustration of how this is done is shown below. Click on the image to zoom in.

The following calculations occurs

  1. Recover motion between image 0 and 1 and triangulate to get the blue points. These points will be used as the reference to figure out the relative scale.
  2. Recover motion between image 1 and image 2 and triangulate to get the red points. Using our 2D feature matching done earlier we find that they are the same as the blue points. The rotation (R_2) is correct, but the translation (t_2) needs to be scaled.
  3. Find the relative scale by picking a pair of blue points and the matching red points and calculate the ratio of the vector lengths.
  4. Apply the scale to t_2  and update the transformation and re-triangulate the points.

Everything I’ve outlined so far is the basically visual odometry. This method only works if the images are taken sequentially apart.

Bundle adjustment

Everything we’ve done so far more or less works but is not optimal, in the statistical sense. We have not taken advantage of constraints imposed by features matched across multiple views nor have we considered uncertainty in the measurement (the statistical part). As a result, the poses and 3D points recovered won’t be as accurate as they can be. That’s where bundle adjustment comes in. It treats the problem as one big non-linear optimization. The variables to be optimized can include all the poses, all the 3D points and the camera matrix. This isn’t trivial to code up from scratch but fortunately there are a handful of libraries out there designed for this task. I chose GTSAM because I was learning it for another application. They had an example of using GTSAM for SFM in the example directory, which I based my code off.

Output for PMVS

After performing bundle adjustment we have a good estimate of the poses and 3D points. However the point cloud generally isn’t  dense. To get a denser point cloud we can use PMVS. PMVS only requires the 3×4 projection matrix for each image as input. It will generate its own features for triangulation. The projection matrix can be derived from the poses optimized by GTSAM.


Here’s a dataset of my desk you can use to verify everything is working correctly.

Here’s a screenshot of the output point cloud in Meshlab.

If you set Meshlab to display in orthographic mode you can see the walls make a right angle.


EKF SLAM with known data association

In the previous post I wrote a C++ implementation of the EKF localization algorithm from the Probabilistic Robotics book. As a continuation I also wrote an implementation for the EKF SLAM with known data association algorithm. This is similar to EKF localization except we’re also estimating the landmarks position and uncertainty. The code can be found here.

Here’s a screenshot of it in action.

EKF localization with known correspondences

Been a while since I posted anything. I recently found some free time and decided to dust off the Probabilistic Robotics book by Thrun et. al. There are a lot of topics in the book that I didn’t learn formally back during school.

I’ve used the vanilla Kalman Filter for a few projects in the past but not the extended Kalman Filter. I decided to implement the Extended Kalman Filter localization with known correspondences algorithm. The simulation consists of a robot with a range sensor that can detect known landmarks in the world. You move the robot around using the arrow keys. The simulation is written in C++ and uses SDL and OpenGL. Below shows a screenshot. The true pose is in green, gray the EKF estimate and 95% confidence ellipse, red are the landmarks.

One addition I made was handle the case when angular velocity is zero. The original algorithm presented in the book would result in a divide by zero. You can grab and play with the code here.

An interesting behavior that I’ve been trying to understand is the EKF covariance can shrink (reduce uncertainty), even if you are only doing predictions (no correction using landmark). It’s either a coding bug or some side effect of linearization. Either way, it’s driving me nuts!

Using TensorFlow/Keras with CSV files

I’ve recently started learning TensorFlow in the hope of speeding up my existing machine learning tasks by taking advantage of the GPU. At first glance the documentation looks decent but the more I read the more I found myself scratching my head on how to do even the most basic task. All I wanted to do was load in a CSV file and run it through a simple neural network. When I was downloading the necessary CUDA libraries from NVIDIA I noticed they listed a handful of machine learning framework that were supported. One of them was Keras, which happens to build on top of TensorFlow. After some hard battles with installing CUDA, TensorFlow and Keras on my Ubuntu 16.04 box and a few hours of Stackoverflow reading I finally got it working with the following python code.

from keras.models import Sequential
from keras.layers import Dense, Activation

import numpy as np
import os.path

if not os.path.isfile("data/pos.npy"):
    pos = np.loadtxt('data/pos.csv', delimiter=',', dtype=np.float32)'data/pos.npy', pos);
    pos = np.load('data/pos.npy')

if not os.path.isfile("data/neg.npy"):
    neg = np.loadtxt('data/neg.csv', delimiter=',', dtype=np.float32)'data/neg.npy', neg);
    neg = np.load('data/neg.npy')

pos_labels = np.ones((pos.shape[0], 1), dtype=int);
neg_labels = np.zeros((neg.shape[0], 1), dtype=int);

print "positive samples: ", pos.shape[0]
print "negative samples: ", neg.shape[0]


model = Sequential()

model.add(Dense(output_dim=HIDDEN_LAYERS, input_dim=pos.shape[1]))

model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy']), neg)), np.vstack((pos_labels, neg_labels)), nb_epoch=10, batch_size=128)

# true positive rate
tp = np.sum(model.predict_classes(pos))
tp_rate = float(tp)/pos.shape[0]

# false positive rate
fp = np.sum(model.predict_classes(neg))
fp_rate = float(fp)/neg.shape[0]

print ""
print ""

print "tp rate: ", tp_rate
print "fp rate: ", fp_rate

I happened to have my positive and negative samples in separate CSV files. The CSV files are converted to native Numpy binary for subsequent loading because it is much faster than parsing CSV. There’s probably some memory wastage going on with the np.vstack that could be improved on.

Understanding OpenCV cv::estimateRigidTransform

OpenCV’s estimateRigidTransform is a pretty neat function with many uses. The function definition is

Mat estimateRigidTransform(InputArray src, InputArray dst, bool fullAffine)

The third parameter, fullAffine, is quite interesting. It allows the user to choose between a full affine transform, which has 6 degrees of freedom (rotation, translation, scaling, shearing) or a partial affine (rotation, translation, uniform scaling), which has 4 degrees of freedom. I’ve only ever used the full affine in the past but the second option comes in handy when you don’t need/want the extra degrees of freedom.

Anyone who has ever dug deep into OpenCV’s code to figure out how an algorithm works may notice the following:

  • Code documentation for the algorithms is pretty much non-existent.
  • The algorithm was probably written by some soviet Russian theoretical physicists who thought it was good coding practice to write cryptic code that only a maths major can understand.

The above applies to the cv::estimateRigidTransform to some degree. That function ends up calling static cv::getRTMatrix() in lkpyramid.cpp (what the heck?), where the maths is done.

In this post  I’ll look at the maths behind the function and hopefully shed some light on how it works.

Full 2D affine transform

The general 2D affine transform has 6 degree of freedoms of the form:

T = \left[ \begin{array}{ccc} a & b & c \\ d & e & f \end{array} \right]

This transform combines rotation, scaling, shearing, translation and reflection in some cases.

Solving for T requires a minimum of 3 pairing points (that aren’t degenerate!). This is straight forward to do. Let’s denote the input point to be X= [x y 1] and the output to be Y = [x’ y’ 1], giving:

TX = Y

\left[ \begin{array}{ccc} a & b & c \\ d & e & f \end{array} \right] \left[\begin{array}{c} x \\ y \\ 1\end{array}\right] = \left[\begin{array}{c} x' \\ y' \\ 1\end{array}\right]

Expanding this gives

\begin{array}{c} {ax + by + c = x'} \\ {dx + ey + f = y'}\end{array}

We can re-write this as a typical Ax = b matrix and solve for x. We’ll also need to introduce 2 extra pair of points to be able to solve for x.

\left[\begin{array}{cccccc} x_1 & y_1 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 &x_1 & y_1 & 1 \\ x_2 & y_2 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 &x_2 & y_2 & 1 \\ x_3 & y_3 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 &x_3 & y_3 & 1\end{array}\right]\left[\begin{array}{c} a \\ b \\ c \\ d \\ e \\ f \end{array} \right] = \left[\begin{array}{c} x_1 \\ y_1 \\ x_2 \\ y_2 \\ x_3 \\ y_3 \end{array} \right]

Now plug in your favourite linear solver to solve for [a, b, c, d, e, f].

If you have more than 3 pair of points then you can do least squares by doing:

\begin{array}{ccc} Ax & = & b \\ A{^T}Ax & = & A^{T}b \\x & =& (A{^T}A)^{-1}A{^T}b \end{array}

Partial 2D affine transform

The partial affine transform mentioned early has a reduced degree of freedom of 4 by excluding shearing leaving only rotation, uniform scaling and translation. How do we do this? We start with the matrices for the transforms we are interested in.

R (rotation) = \left[\begin{array}{cc} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta)\end{array}\right]

S (scaling) = \left[\begin{array}{cc} s & 0 \\ 0 & s \end{array}\right]

t (translation) = \left[\begin{array}{c} tx \\ ty \end{array}\right]

Our partial affine transform is

T = \left[RS | t \right]

Expanding gives

T = \left[ \begin{array}{ccc} \cos(\theta)s & -\sin(\theta)s & tx \\ \sin(\theta)s & \cos(\theta)s & ty\end{array} \right]

We can rewrite this matrix by defining

a = \cos(\theta)s

b = \sin(\theta)s

c = tx

d = ty

T = \left[ \begin{array}{ccc} a & -b & c \\ b & a & d\end{array} \right]

Solving for [a, b, c, d]

TX = Y

\left[ \begin{array}{ccc} a & -b & c \\ b & a & d \end{array} \right] \left[\begin{array}{c} x \\ y \\ 1\end{array}\right] = \left[\begin{array}{c} x' \\ y' \\ 1\end{array}\right]

\begin{array}{c} {ax- by + c = x'} \\ {bx + ay + d = y'}\end{array}

Solving for [a, b, c, d]

\left[\begin{array}{cccc} x_1 & -y_1 & 1 & 0 \\ y_1 & x_1 & 0 & 1 \\ x_2 & -y_2 & 1 & 0 \\ y_2 & x_2 & 0 & 1 \end{array}\right]\left[\begin{array}{c} a \\ b \\ c \\ d \end{array} \right] = \left[\begin{array}{c} x_1 \\ y_1 \\ x_2 \\ y_2 \end{array} \right]

Notice for the partial affine transform we only need 2 pair of points instead of 3.

Final remark

Well, that’s it folks. Hopefully that gives you a better understanding of the 2D affine transform. So when should you use one or the other? I tend to the use the partial affine when I don’t want to overfit because the data has some physical constraint. On the plus side, it’s a bit faster since there are less parameters to solve for. Let me know which one you use for your application! Best answer gets a free copy of OpenCV 3.x 🙂

Computer vision on the FPGA

This is a follow up on my previous post about my first step into the FPGA world. In this post I’ll give a high level description of my implementation of a simple 3×3 convolution image filtering circuit. It does not aim to be fast or efficient, just something that works. If you want to know anything in more detail look at the code or leave a comment below.

I’ll assume the reader is familiar with 3×3 filters like the Sobel edge detector. In this project l implement the Laplacian filter, but you can easily change the filter by changing the hard coded values inside the Verilog file.

Below is a high level overview of my FPGA design. I’ve omitted the input/output pins for each of the block component for simplicity.


The laptop sends the image over the serial one row at a time to the FPGA. The FPGA then immediately sends the processed pixel row back to the laptop. I chose this design because it only requires three row buffers on the FPGA. The Basys2  has 72Kbits of fast RAM.  For an image width of 128 pixels, it only requires 128×3 = 384 bytes of RAM.

The state machine handles all the logic between the serial, BRAM and 3×3 linear filter.

I decided to use serial for communication between the laptop and FPGA because it was simpler, though slower. I did however manage to crank up the serial speed to 1.5 Mbit by adding an external 100Mhz crystal oscillator. Using the default 50Mhz oscillator on the Basys2 I can get up to 1 Mbit.

I use three banks of dual port BRAM (block RAM). The dual port configuration allows two simultaneously read/write to the same BRAM. This allows me to read 3×2=6 pixels  in one clock cycle. For 3×3=9 pixels it takes two clock cycles. I could do better by making the read operation return more pixels. The three banks of BRAM act as a circular buffer. There’s some logic in the state machine that keeps track of which bank of BRAM to use for a given input pixel row.

Right now with the my current BRAM configuration and 3×3 filter implementation it takes 5 clock cycles to process a single pixel, excluding serial reading/writing. If you include the serial transmission overhead then it takes about 380 ms on my laptop to process a 128×128 grey image.

Below is the result of the Laplacian filter on Lena. I ignored the pixels on the border of the image, so the final image is 126×126.



The zip file contains only the necessary Verilog code and a main.cpp for sending the image. The main.cpp requires OpenCV for display. I omitted any Xilinx ISE specific project files to keep it simple. But if there’s any missing files let me know. Your input image must be 128×128 in size to work.

WordPress malware

My site got hit hard with some malicious WordPress trojans (possibly via plugins) and caused the site to get suspended. It’s back up now with a fresh theme. I deleted every plugin and re-installed only the necessary ones. Code highlighting isn’t enabled yet, can’t rememeber what I last installed.

First trip to FPGA land

Been a while since I last posted anything. I’ve been pretty busy at my new job but managed to find some spare time to learn about FPGA. I wanted to see what all the rage is about. I purchased a Digilent Basys2 not too long ago to play with. It’s an entry board under $100, perfect for a noob like myself. Below is the Basys2 board. I bought a serial module so I can communicate using the latop, shown in the top left connected to the red USB cable.


For the first two weeks I went through the digital design book on Digilent’s site to brush up on stuff I should have remembered during my undergrad but didn’t. It’s a pretty good book for a quick intro to the basic building blocks with example code in Verilog/VHDL. I chose the Verilog book over VHDL because I found it easier to read, and less typing.

I then spent the next two or three weeks or so implementing a simple RS232 receiver/transmitter, with help from here. Boy, was that a frustrating project, but I felt I learnt a lot from that experience. That small project helped me learned about RS232 protocol, Verilog, Xilinx ISE and iSim.

Overall, I’m enjoying FPGA land so far despite how difficult it can be. There’s something about being intimately closer to the hardware that I find appealing.

My original intention for learning the FPGA is for image processing and computer vision tasks. The Basys2 doesn’t have a direct interface for a camera so for now I’ll stick to using the serial port to send images as proof of concept. Maybe I’ll upgrade to a board with a camera interface down the track.

I recently wrote a simple 3×3 filter Verilog module to start of with. It’s a discrete 3×3 Laplacian filter.

module filter3x3(
input wire clk,
input wire [7:0] in0,
input wire [7:0] in1,
input wire [7:0] in2,
input wire [7:0] in3,
input wire [7:0] in4,
input wire [7:0] in5,
input wire [7:0] in6,
input wire [7:0] in7,
input wire [7:0] in8,
output reg signed [15:0] q

always @ (posedge clk)
    q <= in0 + in1 + in2 +
          in3 - in4*8 + in5 +
          in6 + in7 + in8;


This module takes as input 8 unsigned bytes and multiples with the 3×3 kernel [1 1 1; 1 -8 1; 1 1 1] and sums the output to q. It is meant to do this in one clock cycle. I’ve tested it in simulation and it checks out. Next step is to hook up to my serial port module and start filtering images. Stay tune!

Fixing vibration on Taig headstock

I’ve been using the Taig for a few weeks now and had noticed significant vibration on the headstock. I don’t have access to another Taig to compare to see if it’s normal but it definitely “felt” too much. Last week I finally got around to addressing it.

The first thing I noticed was the pulley on the motor being slightly wonky (not running true). After a call to Taig they sent me a replacement pulley. They were very helpful and knowledgeable and know their stuff. They suggested other factors that I can check to help with the vibration issue.

When the pulley arrived I replaced the old one and re-aligned the belt using a parallel and eye balling. Sadly, this did not fix the problem. Then I remember one suggestion by the guys at Taig. He said where the motor mount aligns to the 1×1 inch steel block might not be perfectly flat and could cause vibration because the pulley would be on a slight angle. Guess what, he’s right! With the spindle running at 10,000RPM or so, I applied some force to the motor with my hand and noticed the vibration varied a lot.

This is the hack that I settled on. Every time I adjust the spindle speed I screw the far back screw tight, while the other one is lightly tightened. Just enough so the motor doesn’t twist. I got curious and hooked up a dial indicator to the headstock. As I tightened the screw I could see the vibration ramp up.

Here are two pics that summarize the problem (I think) and the hack fix.



Your situation may differ. Play around with the screws and see what happens.For now it seems to do the job. Vibration is at a minimum, I’m happy.

Machining an aluminium block on the Taig CNC mill

I’ve recently had the chance to get back into some CNC’ing with the Taig CNC mill at work. This is the first time I’ve used the Taig and I must say it’s a pretty slick machine for its size. I was tasked with machining out a rectangular block as an initial step for a more complex part. I thought given how much beefier this CNC is compared to my home made ones it should be a walk in the park. It could not be further from the truth!

I’ve consulted many online/offline calculators and read as much posts from other people’s experience to hone in a setting I was happy with. I’ve lost count of how many end mills I’ve broken along the way! Fortunately I ordered some fairly inexpensive end mills to play with so it wasn’t too bad.

The two most common end mills I’ve used are the 3/8″ 1 inch length of cut and a 1/8″ 1 inch length of cut, both 2 flutes and both from Kodiak. To machine out the block I saved some time by removing most of the material with a drop saw (miter saw). This meant less work for the 3/8″ end mill. Here’s a summary of what I used

3/8″ end mill 2 flutes, 1 inch length of cut

  • RPM: ~ 3000
  • feed rate: 150 mm/min (~6 ipm)
  • plunge rate: 50 mm/min (~2 ipm)
  • depth per pass: 2 mm (0.0787″)

1/8″ end mill 2 flutes, 1 inch length of cut

  • RPM: ~10,000
  • feed rate: 500 mm/min (~20 ipm)
  • plunge rate: 200 mm/min
  • depth per pass: 0.5 mm (~ 0.02″)

The 1/8″ end mill isn’t used to machine the block but for another job.

Here is my very basic Taig setup. It is bolted to the table top.


There’s no automatic coolant or air flow installed so I’m doing it manually by hand. Not ideal, but does the job and keeps me alert! I’m using Kool Mist 77. That’s the 3/8″ end mill in the pic. It’s about the biggest end mill that is practical on the Taig. I’ve added rubber bands to the safety goggle to stop it from falling off my head because I wear glasses.

Below shows one side of the block I milled. The surface is very smooth to touch.


But when looking on the other side, where the cutter is going in the Y direction it shows some wavy patterns?! Not sure what’s going on there. It didn’t mess up the overall job because it was still smooth enough that I could align it on a vise. Still, would like to know what’s going on.


Here are some things I’ve learnt along the way

Clear those chips!

I originally only used Kool Mist and just squirting extra hard to clear the chips. This got a bit tiring and I wasn’t doing such a great job during deeper cuts. Adding the air hose makes life much easier. I found 20-30 PSI was enough to clear the chips.

Take care when plunging

I’ve read the 2 flutes can handle plunging okay but I always find it struggles if you are not careful. The sound it makes when you plunge can be pretty brutal to the ear, which is why I tend to go conservative. I usually set the plunge rate to half the feed rate and take off 50 mm/min. If I’m doing a job in CamBam I use the spiral plunging option, which does a very gradual plunge while moving the cutter in a spiral. Rather than a straight vertical plunge, which makes it harder to clear chips at the bottom. I’ve broken many smaller end mills doing straight plunges. I once did a plunge with the 3/8″ end mill that was a bit too fast and it completely stalled the motor.

I probably should look at per-drilling holes to minimize plunging.

Go easy!

I’ve found a lot of the answers from the feed rate calculators rather ambitious for the Taig. They tend to assume you got a big ass CNC with crazy horse power. Some of the calculators I’ve used take into consideration the tool deflection and horse power, which is an improvement. But at the end of the day the Taig is a tiny desktop CNC weighing at something like 38kg. So know its limits!