A convolutional neural network trained from scratch to classify animal photographs. 91.43% test accuracy across 7 categories.
Batch predictions on the test set. Green = correct, red = misclassified.
Given a photograph, the model predicts one of seven categories: squirrel, lion, horse, elephant, chicken, camel, or bear. With no pre-trained weights and only 2,392 training images, every component was built from scratch.
Three convolutional layers with increasing filter counts (32→64→128) build the feature hierarchy from edges and textures up to shapes and patterns.
A CNN was the right fit for 2,392 images since ViTs need far more data to train from scratch. Random rotation, colour jitter, and RandomErasing handled the limited variety, and BatchNorm, AdamW, and 50% dropout kept the small model from memorising.
Through iterative refinement, the model improved from a 58% baseline to 91.43%, isolating variables like architecture, optimizer, resolution, and augmentation at each step.