Convolutional neural network and CIFAR-10

I’ve been experimenting with convolutional neural networks (CNN) for the past few months or so on the CIFAR-10 dataset (object recognition). CNN have been around since the 90s but seem to be getting more attention ever since ‘deep learning’ became a hot new buzzword.

Most of my time was spent learning the architecture and writing my own code so I could understand them better. My first attempt was a CPU version, which worked correctly but was not fast enough for any serious use. CNN with complex architectures are notoriously slow to train, that’s why everyone these days use the GPU.  It wasn’t until recently that I got a CUDA version of my code up and running. To keep things simple I didn’t do any fancy optimisation. In fact, I didn’t even use shared memory, mostly due to the way I structured my algorithm. Despite that, it was about 10-11x faster than the CPU version (single thread). But hang on, there’s already an excellent CUDA CNN code on the net, namely cuda-convnet,  why bother rolling out my own one? Well, because my GPU is a laptop GTS 360M (circa 2010 laptop), which only supports CUDA compute 1.2. Well below the minimum requirements of cuda-convnet. I could get a new computer but where’s the fun in that 🙂 And also, it’s fun to re-invent the wheel for learning reasons.

Results

As mentioned previously I’m working with the CIFAR-10 dataset, which has 50,000 training images and 10,000 test images. Each image is a tiny 32×32 RGB image. I split the 50,000 training images into 40,000 and 10,000 for training and validation, respectively. The dataset has 10 categories ranging from dogs, cats, cars, planes …

The images were pre-processed by subtracting each image by the average image over the whole training set, to centre the data.

The architecture I used was inspired from cuda-convnet and is

Input – 32×32 image, 3 channels

Layer 1 – 5×5 convolution filter, 32 output  channels/features, Rectified Linear Unit neurons

Layer 2 – 2×2 max pool, non-overlapping

Layer 3 – 5×5 convolution filter, 32 output  channels/features, Rectified Linear Unit neurons

Layer 4 – 2×2 max pool, non-overlapping

Layer 5 – 5×5 convolution filter, 64 output  channels/features, Rectified Linear Unit neurons

Layer 6 – fully connected neural network hidden layer, 64 output units, Rectified Linear Unit neurons

Layer 7 – fully connected neural network hidden layer, 10 output units, linear neurons

Layer 8 – softmax, 10 outputs

I trained using a mini-batch of 128, with a learning rate of 0.001 and momentum of 0.9. At each epoch (one pass through the training data), the data is randomly shuffled. At around the 62th epoch I reduced the learning rate to 0.0001. The weights are updated for each mini-batch processed. Below shows the validation errors vs epoch.

cnn

After 85 epochs the results are:

– training error 7995/40000 ~ 20%

– validation error 3156/10000 = 31.56%

– test error 3114/10000 = 31.14%

Results seem okay until I compared them with results reported by cuda-convnet simplest architecture [1] [2]: ~8 epochs (?), 80 seconds, 26% testing error. Where as mine took a few hours and many more epochs, clearly I’m doing something wrong!!! But what? I did a rough back of the envelope calculation and determined that their GPU code runs 33x faster than mine, based on timing values they reported. Which means my CUDA code and hardware sucks badly.

On the plus side I did manage to generate some cool visualisation of the weights for layer 1. These are the convolution filters it learnt. This result is typical of what you will find published in the literature, so I’m confident I’m doing something right.

weights
Features learnt by Layer 1

You can see it has learnt some edge and colour filters.

One thing I really want to try at the moment is to get my hands on a newer Nvidia card and see how much speed up I get without doing anything to the code.

I’m not releasing any code yet because it’s very experimental and too ugly to show.

13 thoughts on “Convolutional neural network and CIFAR-10”

  1. Nghia, you mentioned that you shuffled the data randomly after each epoch. I wonder how much does it benefit the performance. Do you know if Alex’s ConvNet does shuffling as well?

    1. I can’t remember if Alex’s ConvNet does the shuffling but I did it based on the neural network/machine learning classes on Coursera. It’s to meant to minimize overfitting and stop the network from learning a particular sequence of data.

    1. I don’t have code for this exact one, but you can use the previous code I released and change the parameters for this particular one.

  2. “At each epoch (one pass through the training data), the data is randomly shuffled.” Did you shuffle the training data or testing data?
    If a CNN is trained with a shuffled training data does it make a difference in the accuracy ? If so what do you think the reason behind it is?

    1. I shuffle the training data.

      The shuffling is supposedly meant to prevent the network from learning the same sequence of images and reduce over fitting.

Leave a Reply

Your email address will not be published. Required fields are marked *