Question 3¶

We start by importing everything we will be using to make the CNN.

import torch
import tensorflow as tf
from PIL import Image
from torchvision import transforms, models

Download the imagenet class list

!wget -O imagenet_classes.txt https://raw.githubusercontent.com/Lasagne/Recipes/master/examples/resnet50/imagenet_classes.txt?fbclid=IwAR19mHA3rPwm_4OynZs_G4oUG9qVhK33aMM7Z2ASLxNUChPp4LE6-V0GQ9Q

--2020-04-02 19:35:01--  https://raw.githubusercontent.com/Lasagne/Recipes/master/examples/resnet50/imagenet_classes.txt?fbclid=IwAR19mHA3rPwm_4OynZs_G4oUG9qVhK33aMM7Z2ASLxNUChPp4LE6-V0GQ9Q
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.124.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.124.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21674 (21K) [text/plain]
Saving to: ‘imagenet_classes.txt’

imagenet_classes.tx 100%[===================>]  21.17K  --.-KB/s    in 0.02s   

2020-04-02 19:35:01 (1.11 MB/s) - ‘imagenet_classes.txt’ saved [21674/21674]

Loading the classes¶

Start by loading the classes from the file. I output the top 5 of them just for confirmation.

with open('imagenet_classes.txt') as f: #read the categories from file
    classes = [line.strip() for line in f.readlines()]
print(classes[0:5])

['tench, Tinca tinca', 'goldfish, Carassius auratus', 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias', 'tiger shark, Galeocerdo cuvieri', 'hammerhead, hammerhead shark']

Loading the Image¶

We load the image using PIL.

image = Image.open('./WelshCorgi.jpeg') #load image

Next we need to define a helper function for transforming images into the proper dimensions for AlexNet. We also want to transform the image into a tensor and then create a "batch" from our image (Add another dimension to the image). The final step is to normalize the image data to the means and standard deviation from the imagenet database. Otherwise the results wouldn't be very useful

def preprocess(image):
    transform = transforms.Compose([
        transforms.Resize(256), #change image dims to 256
        transforms.CenterCrop(224), #crop the data to 224x224
        transforms.ToTensor(), #convert to tensor
        transforms.Normalize( #normalize data with imagenet mean and std dev
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
        )
    ])
    img_n = transform(image) #apply the transform
    return torch.unsqueeze(img_n, 0) #add one dimension to the start

Now we have to apply these changes to the image we will be using.

data = preprocess(image)

Loading AlexNet (rand init)¶

First we load the image and the AlexNet not trained on any data. The weights will be initialized to random values.

alex_net = models.alexnet() #random weights
alex_net.eval() #set to evaluation mode

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace=True)
    (3): Dropout(p=0.5, inplace=False)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace=True)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

Lets output the number of input and output features of the last layer

params = [y.numel() for y in alex_net.parameters()]
print("Last layer has", params[-2], "input features and", params[-1], "output features")

Last layer has 4096000 input features and 1000 output features

Now we are ready to get a prediction from the CNN.

image_pred = alex_net(data) #get a prediction

We define a qucik function to summarize the top 5 predictions and print them out for us.

def summarize(out): #helper util to summarize the top 5 predictions
    indices = torch.argsort(out, dim=1, descending=True)
    prob = torch.nn.functional.softmax(out, dim=1)[0] * 100
    top = [(classes[index], prob[index].item()) for index in indices[0][:5]]
    for val in top:
        print("Prediction:",val[0], "Probability",val[1])

So what is our picture?

summarize(image_pred)

Prediction: basset, basset hound Probability 0.10284849256277084
Prediction: guacamole Probability 0.10280108451843262
Prediction: airship, dirigible Probability 0.10280008614063263
Prediction: wire-haired fox terrier Probability 0.10279516875743866
Prediction: theater curtain, theatre curtain Probability 0.10271072387695312

Pretrained AlexNet¶

We'll continue using the above functions for a CNN pretrained on imagenet data.

net = models.alexnet(pretrained=True) #load pre trained data
net.eval() #set to evaluation mode

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace=True)
    (3): Dropout(p=0.5, inplace=False)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace=True)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

image_pred2 = net(data) #get a prediction

summarize(image_pred2) #get summary

Prediction: Pembroke, Pembroke Welsh corgi Probability 89.18708801269531
Prediction: Cardigan, Cardigan Welsh corgi Probability 10.649338722229004
Prediction: kelpie Probability 0.04727327823638916
Prediction: Eskimo dog, husky Probability 0.04288318008184433
Prediction: Shetland sheepdog, Shetland sheep dog, Shetland Probability 0.017419634386897087

These are much better predictions!