In this post I’ll be showing how to use Principal Component Analysis (PCA) to perform linear data reduction for the purpose of data visualisation.

There is a fair bit of material on this subject on the web, some are quite mathematically involved. I want to try something different and explain PCA from a **geometric** point of view, rather than from a purely numerical point of view. I’m hoping this will make it easier to understand.

# Our problem

Say we have a 3D dataset as shown below.

I picked a noisy 3D plane, made up of 1000 random points, with the centre highlighted in green for visual aid (you will see why later on). The plane is on a slant of 45 degrees relative to both the x1 and x2 axis. I labelled the graph using x1, x2, x3 to keep it general. I could have used X, Y, Z but that might imply something specific about the data.

Okay, say now for some reason you want to ‘squash’ the 3D data into 2D for visualisation purposes. But you want the squashing process to capture as much of the **spread/variance** of the data as much possible. This is essentially a **data reduction **problem, but with some extra constraints, namely maximising the variance.

# From 3D to 2D without PCA

One way of projecting our data from 3D to 2D, without using PCA, would be to simply pick any 2 combination of the 3 dimensions and plot that. Let’s try that. I’ll pick x1 vs x2.

As you can see it’s not a bad projection. You can see the red and green crosses quite well, with a reasonable amount of spread in all axis. Everything is skewed though. The other combinations, x1 vs x3, x2 vs x3, produce similar graphs (because of the 45 degree rotation), so I won’t bother plotting them. So can we do better? Intuitively, the best projection is when we are looking directly/perpendicular to the plane, as shown below.

# PCA to the rescue

Looking back at the previous graph, we want the 3D plane to lie nicely onto 2D space, but how do we do it? One intuitive way you can think of doing this is to somehow rotate the plane so that it aligns to one of the 3 possible axis combinations. For example, if we could somehow rotate the plane so that it lies perfectly flat along the x1,x2 axis and ignore x3, then we got a pretty ideal projection.

PCA is a process involving a bunch of steps that will do more or less exactly what I described above. By rotating the data it puts it on a brand new set of axis, that I’ll call pc1, pc2 … pcN, to avoid any confusion. In general, it essentially rotates the data (via an N dimensional rotation matrix) in such a way as to maximise the spread/variance along the new axis pc1, pc2 … pcN, sorted from largest to smallest.

The steps to perform PCA for the purpose of **visualisation** are

- Translate the data so that the centre is at the origin
- Calculate the covariance matrix
- Find the principal components
- Reduce the data using the selected principal components

These steps are different if you are interested in **data compression**. I won’t discuss about data compression in this post. Let’s go through the visualisation steps.

## 1. Translating the data

This step is straight forward. Find the average/mean of the 3D data and subtract it from all the data. In Octave this is

% data is a Nx3 matrix m = mean(data); data_m = data - repmat(m, length(data), 1);

The reason we have to translate the data is because we want to rotate relative to the centre of the data.

## 2. Calculate the covariance matrix

The covariance matrix, using data_m previously is,

N = size(data, 1); covar = data_m'*data_m/N;

This produces a 3×3 matrix that is symmetric.

## 3. Finding the rotation

We can find the optimal rotation using Singular Value Decomposition (SVD). I won’t discuss what this function does, since it’s a whole topic on its own. For now we’re just interested in the result it produces. The SVD is called by

[U,S,V] = svd(covar);

The matrix U and V are both the same. I’ll pick V. V is the 3×3 rotation matrix we are interested in.

S is the singular value matrix and it contains useful information regarding the principal components of V (column vectors). For example

S = Diagonal Matrix 3.4576e+04 0 0 0 3.1959e+04 0 0 0 3.3793e+02

If we normalise the diagonals, we get

diag(S) / sum(diag(S)) ans = 0.5297516 0.4653522 0.0048962

These values can be interpreted as the percentage of how much variance each principal component captures, noticed they’re sorted from largest to smallest. Because we are interested in the first 2 components, this captures about 0.5297516 + 0.4653522 = ~ 0.995, or 99.5% of our data. We expect this to be high for the example dataset because we know for a fact that a plane in 3D can be always be projected onto 2D with very little loss of information.

## 4. Data reduction using principal components

Now that we have our rotation matrix V, we can conceptually rotate the data, align them to some new axis, and just keep the first 2 dimensions, like so

data_r = data_m*V; % rotated data reduced_data = data_r(:, 1:2); % keep first 2 components

But it’s more common to do this, since it’s more efficient code

reduced_data = data_m*V(:, 1:2);

as there are less matrix operations involved.

## Visualisation

reduced_data is now a Mx2 matrix, which we can plot in 2D. Alright, lets see what it looks like.

As you can see, the plane now looks like a proper square and the centre circle looks like a circle. Interestingly, it rotated the plane by some amount. Remember earlier I said PCA finds the maximum variance for each principal component. It turns out rotating a square so that the diagonal align an axis meets this requirement. Intuitively, a square has a bigger bounding box if it’s rotated, and hence slightly larger variance along each axis, if we assume the square is uniformly sampled. This is illustrated below.

# Summary

There you have it folks. A geometric based explanation on using PCA for data visualisation. A summary of the Octave commands are

N = size(data, 1); m = mean(data); % each row is a data sample data_m = data - repmat(m, N, 1); covar = data_m'*data_m/N; % or N-1 for unbiased estimate [U,S,V] = svd(covar); reduced_data = data_m*V(:,1:2); % reduce to 2 components

You can also use the **cov()** function to calculate the covariance instead of doing it step by step.

Everything I have explained is applicable to higher dimensions. That’s all there is to it!

Hey! I came across your blog while looking for PCA. I would love to try out this experiment. What I don’t understand is where I can get a 3-D image database from so that I can convert that to 2-D using the PCA for visualization of 3-D images technique. Kindly help me.

Try this

http://graphics.stanford.edu/data/3Dscanrep/

they’re old dataset, but still used.

Will do. But before that, I’m going to work on a project on physical feature recognition in humans using pca. Would like to know if the same steps can be followed? Or are your steps case specific? Also, does physical feature recognition in humans come under data visualization? Thanks!

If you’re talking about Eigenfaces then it’s nearly identical, except with extra pre-processing for image data. This wouldn’t fall under data visualization, probably something more specific like facial recognition.

Hey, do you know how to dump the extracted feature vectors to a new folder in matlab versions 7 and 2013a? I’m relatively new to this and there are tons of functions on the net that don’t work. I’m using a for loop to read the images from a folder. Then i’m applying the pca technique. And then, i’d like to export the feature vectors into a new folder.

to a new folder*

I don’t use Matlab so won’t be able to help you there. In Octave I can do

for i=1:10

filename = sprintf(“file-%03d.mat”, i)

save(“-ascii”, filename, “A”)

end

where “A” is the variable to be saved out.

The commands in Octave are similar to Matlab so it might be a copy and paste.

Yes, that was incredibly useful. Thanks a lot!

Hey, first of all, excellent page. Secondly, I have a problem adding the pixels on Matlab. Came across this page when I was looking for solutions. Could you suggest something please? I have 320 64*64 pics and I want to add these pics, pixel by pixel and then average them out. I tried for two pics first using the following code but the error I’m getting is:

Error using +

Integers can only be combined with integers of the same class, or scalar doubles.

Error in tryingtoaddtwopics (line 7)

a = a + img;

=====================================================================

And this is my code:

clc;

i = 1;

a=zeros(64, 64); % 64 * 64 0-matrix

for i = 1:1:2

img = imread ([‘C:\MATLAB7\work\NEW\fr (‘ int2str(i) ‘).jpg’]); % to read the pics

img = rgb2gray(img); % to make it 2D. Is this even required?

a = a + img; % adds to matrix in every step

end

a=a/2; % to compute avg

disp (a);

What should I do?

Please help. Thank you!

OK, sorry to bother you. I got it. Anyway, may I have your email id for future reference? Or can I post my queries here, if that’s OK with you?

You can find my contact in the About page at the bottom.

I’m not using the SVD technique right now. Using eigen vecs.

Anyway,

Please could you clarify what

[junk, rindices] = sort(-1*V);

V = V(rindices);

means?

Sort function with a ‘*’? And V is the matrix with extracted diagonal values.

it’s multiplying V by -1 before sorting them. That’s my guess. Not sure why it needs to do that.

Hey, these were the steps I followed for object identification from a set of training and testing images:

1. convert image from RGB to Grayscale

2. calculate the mean of all the 700 images in the database.

3. subtract the mean obtained above from each image to normalize them

4. compute covariance of each image using cov ()

5. use eigs() on each covariance matrix above and extract first 6 eigen vectors for each image

now I shall store them in a mat file and use a distance metric to compare the test images.

Correct?

That sounds about right. You’ll need to store the mean image as well, to centre the new data.

Can you elaborate on that? What’s new data?

Also I’m storing the eigen values of each pic in a separate mat file. How do I traverse these using my distance metric?

When you’re trying to classify “unseen” data (new data).

I suppose you can load each file individually, store them into a matrix and loop over each one.

Hey I just read this, thanks. This is my code.. I tried it for gender recognition. It looks right to me but when I run it on Matlab, I get different distances for the same image everytime I run it with the female and male db. Could you tell me where I’m going wrong?

256 female images train db, 465 male images train db.

This is code for matching. After following the steps mentioned above.

i = 1;

fem_mean = load (‘fem_mean.mat’); % mean fem. images

emin_f = 100;

image = imread ([‘C:\ProgramFiles\2013_MATLAB\R2013a\work\testFemales\tf (‘ int2str(1) ‘).jpg’]); % first test image

image = rgb2gray(image);

image = double (image);

imagef = image – fem_mean.a; % subtract fem. mean

covar = cov (imagef); % cov matrix

[PCV, D] = eigs(covar,6); % extract six significant eig vecs

for i = 1:1:256 % 256 female images

filename = sprintf(‘f_train_eig_%d.mat’, i); % to load the PCV values in each of the 256 images.

obs = load (filename);

euc_dist = sqrt(sum((PCV-obs.PCV).^2)); % euclidean distance

euc_dist = euc_dist/64; %MSE

ecol_f = sum(euc_dist,2); % sum through columns to get one value for distance so I can compare this with the male euc_distance value

if (ecol_f < emin_f) % to determine smallest distance between test and the whole female db

emin_f = ecol_f; % set emin_f to smallest value

end

end

disp(emin_f);

You don’t calculate the covariance again for the test images, it’s done once for training. You should be storing the mean + 6 eigenvector matrix. The eigenvector matrix should of size 6xN, where N is of size width*height, which is 64×64 = 4096 in your case. Then for every image you want to test you just do:

eigen_vector_matrix * (test_image – mean) ==> 6 coefficients

the mean is a vector 4096 long.

Also, make sure your training process is correct. The image you load needs to be converted into a column vector with image = image(:). It seems in your case you’re treating the image as a 2D matrix when it should be 1D. Taking your female database as example, if we store all the images as a big matrix X with dimensions 4096×256, then the covariance is

covariance = X*X’ / 256

oh and the mean calculation looks like:

a = zeroes (64,64);

for i = 1:1:256

b = % read image (dim: 64×64)

a = a+b;

end

a = a/256;

and a is the mean of the female db pics

Gosh, need to do the entire thing again. How do I concatenate the 256 vectors to calculate the covariance matrix? I’m not storing the images as .mat files.

THANKS.

Oh also, i should store the answer from

eigen_vector_matrix * (test_image – mean) ==> 6 coefficients

in a variable, say to_match and find the distance between to_match and each of the 6 eigen_vector matrices in my female and male databases to determine which gives the least distance value, correct?

you can use

X = [X A]

to concat column A to X

Alright. Once I do the:

eigen_vector_matrix * (test_image – mean)

what should I do? I’ll have 6 coefficients with me but how do I determine the gender of the image?

A few ways to do this. If you think of the female and male training data as a bunch of points in 6D space, they might form 2 distinct clusters (hopefully). You then see which cluster your test data is closer to. You can do this by finding a single point the test data is closest to and pick that to be your answer, or pick the 5 closest and do a voting (K nearest neighbour).

You can also change the training slightly by combining both female/male so you only have 1 mean and 1 eigenvector matrix for both.

Yep, that’s correct.

Just saw that. Thanks!

Hey ! Thanks for such a nice explanation of PCA.

I am currently trying to do PCA of a 2-D image, but I am unable to figure out how to retrieve back the image after projecting the original image on the principal component (1st eigen vector).

Following is the code i am using in Matlab:

clear

clc

image = imread(‘C:\Matlab Codes\small.jpg’);

image_grey = image(:,1:144);

A = image_grey;

A = double(A);

meanA = mean(A);

meanA = double(meanA);

stdA = double(std(A));

stdA = double(stdA);

sizeA = size(A);

row_count = sizeA(:,1);

col_count = sizeA(:,2);

X = (A – meanA(ones(row_count,1),:))./(stdA(ones(row_count,1),:));

covariance = cov(X);

[V,D] = eig(covariance);

[rowev,colev] = find(D == max(D(:)));

D = diag(D);

[junk, rindices] = sort(-1*D);

D = D(rindices);

V = V(:,rindices);

pca1 = X’ * V(:,1);

It’l be great if i can get any help as to how I can plot the image having just the principal components.

Try this

http://ecammar.blogspot.com/2013/04/image-compression-using-principal-of.html#comment-form

You should be able to just scale the first eigenvector by pca1, add back the mean, and get the image you want. Assuming pca1 is a scalar.

Hi, thank you very much for this post. Actually I have fully connected graph which can only be visualized in n-dimensions, where n is the number of nodes-1. Now I want to implement PCA to determine which of those dimensions are most important and be able to view the network through the lens of those dimensions. Can you help me. Please.

I don’t know what you are trying to do but perhaps the other blog post that I linked to in my last comment might help.

Hey,

Your code help me. Here is the version I’m using on with python and numpy

m = np.mean(data, axis=0)

data_m = data - m

covar = np.cov(data_m.T)

U, S, Vt = np.linalg.svd(covar)

V = Vt.T

reduced_data = np.dot(data, matrix)

And if anyone is interested in showing the vector, is just

e1 = V[:, 0]

e2 = V[:, 1]

Hello! thanks very much for your explanation. I have tried your codes, it works well ( I work with Matlab, still good). However there is an issue about the alignment.

I have a simple bridge point clouds, the left pier doesn’t align with the right one, which means the two piers cannot overlap on the projected plan.

Do you know why? Thanks in advance!

There’s not enough information for me to figure this out.

So when you say in the last para that “Remember earlier I said PCA finds the maximum variance for each principal component. It turns out rotating a square so that the diagonal align an axis meets this requirement. Intuitively, a square has a bigger bounding box if it’s rotated, and hence slightly larger variance along each axis, if we assume the square is uniformly sample”.

Here, does the axis mean the side of the square or the centre of the square or something else?

The global x,y axis. I should have drawn it. But you can try and picture the axis on the bounding box. Bottom left is (0,0) and up is y-axis and right is x-axis.