I’ve been fascinated for the past months or so on using RBM (restricted Boltzmann machine) to automatically learn visual features, as oppose to hand crafting them. Alex Krizhevsky’s master thesis, Learning Multiple Layers of Features from Tiny Images, is a good source on this topic. I’ve been attempting to replicate the results on a much smaller set of data with mix results. However, as a by product of I did manage generate some interesting results.
One of the tunable parameters of an RBM (neural network as well) is a weight decay penalty. This regularisation penalises large weight coefficients to avoid over-fitting (used conjunction with a validation set). Two commonly used penalties are L1 and L2, expressed as follows:
where theta is the coefficents of the weight matrix.
L1 penalises the absolute value and L2 the squared value. L1 will generally push a lot of the weights to be exactly zero while allowing some to grow large. L2 on the other hand tends to drive all the weights to smaller values.
To see the effect of the two penalties I’ll be using a single RBM with the following configuration:
- 5000 input images, normalized to
- no. of visible units (linear) = 64 (16×16 greyscale images from the CIFAR the database)
- no. of hidden units (sigmoid) = 100
- batch training size = 100
- iterations = 1000
- momentum = 0.9
- learning rate = 0.01
- weight refinement using an autoencoder with 500 iterations and learning rate of 0.01
The weight refinement step uses a 64-100-64 autoencoder with standard backpropagation.
For reference, here are the 100 hidden layer patterns without any weight decay applied:
As you can see they’re pretty random and meaningless There’s no obvious structure. Though what is amazing is that even with such random patterns you can reconstruct the original 5000 input images quite well using a weighted linear combinations of them.
Now applying an L1 weight decay with a weight decay multiplier of 0.01 (which gets multiplied with the learning rate) we get something more interesting:
And lastly, applying L2 weight decay with a multiplier of 0.1 we get
Despite some interesting looking patterns I haven’t really observed the edge or Gabor like patterns reported in the literature. Maybe my training data is too small? Need to spend some more time …