A few years ago I spent some time playing around with CIFAR10. This is a dataset consisting of 10 classes, 50,000 for training and 10,000 for testing. Each image is 32x32 RGB. Back then I had an old CUDA card that didn't support any of the popular frameworks for machine learning. So I ended up writing my own CNN code using CUDA from scratch. This took a bit of time but it was fun. My results were okay back then, a bit over 80% accuracy for validationa and testing. In this post I'm revisting CIFAR10 but this time I'm going to use fastai. I'm expecting to get much better results with less effort this time!
We can use fastai's builtin function to load CIFAR10. This is a convenient one liner.
from fastai.vision.all import *
from fastai.callback.fp16 import *
path = untar_data(URLs.CIFAR)
I'm going to use a pre-trained resnet18 and do transfer learning. In an attempt to speed things up (since I'm paying by the second on Paperspace!) I'll use float16 for training. This is done by calling .to_fp16() on the cnn_learner. Resnet was originally trained on images with size 224x224, but CIFAR10 images are smaller 32x32. This isn't a blocker, as we can train on any image size but it may not give good results. As an experiment I'll train on images of size 32x32, 64x64, 128x128 and 224x224.
The training parameters are pretty simple. There's no data augmentation or any other tricks. The head is trained for 3 epochs and the rest for 10 epochs.
xs = []
ys = []
best_s = []
best_acc = 0
best_learn = None
for s in [32, 64, 128, 224]:
dblock = DataBlock(blocks=(ImageBlock(), CategoryBlock()),
get_items=get_image_files,
get_y=parent_label,
item_tfms=Resize(s))
dls = dblock.dataloaders(os.path.join(path.__str__(), "train"), bs=64)
learn = vision_learner(dls, models.resnet18, metrics=accuracy).to_fp16()
learn.fine_tune(10, freeze_epochs=3)
learn.save(f"cifar10_{s}")
# run on test set
test_files = get_image_files(path / "test")
label = TensorCategory([dls.vocab.o2i[parent_label(f)] for f in test_files])
pred = learn.get_preds(dl=dls.test_dl(test_files))
acc = accuracy(pred[0], label).item()
print(f"{s}x{s}, test accuracy={acc}")
if acc > best_acc:
best_s = s
best_acc = acc
best_learn = learn
xs.append(s)
ys.append(acc)
plt.figure(figsize=(5,5))
plt.plot(xs, ys, 'o-', markersize=10)
plt.xlabel("image size NxN")
plt.ylabel("accuracy")
plt.title("CIFAR10 accuracy on test set vs image size");
interp = ClassificationInterpretation.from_learner(best_learn)
interp.plot_confusion_matrix(figsize=(5,5))
interp.plot_top_losses(49, figsize=(30,30))
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 3.602131 | 2.830318 | 0.108400 | 00:10 |
1 | 2.896018 | 2.366538 | 0.178700 | 00:09 |
2 | 2.258265 | 1.974556 | 0.308800 | 00:09 |
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 2.181820 | 1.926084 | 0.323500 | 00:10 |
1 | 2.036602 | 1.842784 | 0.355300 | 00:10 |
2 | 1.918048 | 1.735874 | 0.394600 | 00:14 |
3 | 1.802964 | 1.636276 | 0.430300 | 00:16 |
4 | 1.711634 | 1.541966 | 0.454900 | 00:15 |
5 | 1.631888 | 1.488092 | 0.474200 | 00:14 |
6 | 1.600144 | 1.448889 | 0.488200 | 00:14 |
7 | 1.554854 | 1.436399 | 0.491600 | 00:14 |
8 | 1.589388 | 1.427464 | 0.495800 | 00:15 |
9 | 1.551987 | 1.418042 | 0.499800 | 00:13 |
32x32, test accuracy=0.49950000643730164
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 3.375597 | 2.665712 | 0.144100 | 00:12 |
1 | 2.577458 | 1.987854 | 0.322000 | 00:13 |
2 | 1.713933 | 1.359496 | 0.553300 | 00:14 |
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 1.576168 | 1.293082 | 0.580300 | 00:16 |
1 | 1.454389 | 1.169415 | 0.615600 | 00:12 |
2 | 1.313278 | 1.026540 | 0.658300 | 00:13 |
3 | 1.170163 | 0.933572 | 0.686100 | 00:16 |
4 | 1.099742 | 0.862677 | 0.705800 | 00:14 |
5 | 1.028216 | 0.829689 | 0.716400 | 00:12 |
6 | 0.999216 | 0.788660 | 0.731500 | 00:14 |
7 | 0.960109 | 0.776576 | 0.733300 | 00:15 |
8 | 0.981314 | 0.772111 | 0.735600 | 00:16 |
9 | 0.975562 | 0.772309 | 0.737500 | 00:11 |
64x64, test accuracy=0.7325000166893005
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 3.292679 | 2.478759 | 0.185500 | 00:13 |
1 | 2.117097 | 1.446210 | 0.522800 | 00:16 |
2 | 1.152057 | 0.796655 | 0.744800 | 00:13 |
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 1.077487 | 0.745378 | 0.760000 | 00:20 |
1 | 0.950641 | 0.663845 | 0.788200 | 00:17 |
2 | 0.849631 | 0.575981 | 0.814800 | 00:17 |
3 | 0.744194 | 0.514916 | 0.833000 | 00:19 |
4 | 0.688671 | 0.478813 | 0.843700 | 00:16 |
5 | 0.658266 | 0.460058 | 0.848100 | 00:17 |
6 | 0.628235 | 0.445286 | 0.853700 | 00:17 |
7 | 0.617357 | 0.435487 | 0.857300 | 00:17 |
8 | 0.629916 | 0.430896 | 0.857900 | 00:17 |
9 | 0.611062 | 0.438232 | 0.855300 | 00:19 |
128x128, test accuracy=0.8483999967575073
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 3.428684 | 2.559396 | 0.155800 | 00:29 |
1 | 2.198105 | 1.578945 | 0.462100 | 00:31 |
2 | 1.360745 | 0.948656 | 0.692500 | 00:31 |
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 1.221270 | 0.898740 | 0.709200 | 00:39 |
1 | 1.125785 | 0.805118 | 0.740400 | 00:37 |
2 | 0.994369 | 0.705828 | 0.769800 | 00:39 |
3 | 0.866615 | 0.631008 | 0.794900 | 00:37 |
4 | 0.803173 | 0.573332 | 0.813500 | 00:37 |
5 | 0.773740 | 0.544131 | 0.822400 | 00:37 |
6 | 0.744626 | 0.523706 | 0.831600 | 00:37 |
7 | 0.728357 | 0.512607 | 0.832600 | 00:37 |
8 | 0.740440 | 0.512368 | 0.832800 | 00:37 |
9 | 0.725927 | 0.512318 | 0.834100 | 00:37 |
224x224, test accuracy=0.8312000036239624
As the image size increases so does the test accuracy. The accuray plateaus after 128x128 at around 85%. The best accuracy for CIFAR10 is >99% according to benchmarks.ai.