Building a Simple Convolutional Neural Network (CNN) with PyTorch

In the world of deep learning, Convolutional Neural Networks (CNNs) have revolutionized computer vision tasks, including image classification. In this blog post, we will explore how to code a simple CNN using the PyTorch framework. By the end of this tutorial, you'll have a basic understanding of building and training a CNN.

Let's start by importing the necessary libraries:

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision import transforms

We will be using the torch library for creating and training the neural network, as well as the torchvision library for loading the MNIST dataset.

Creating the CNN Architecture

In this example, we will create a CNN with two convolutional layers, followed by max pooling, and a fully connected layer for classification.

class CNN(nn.Module):
    
    def __init__(self, in_channels=1, num_classes=10):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        self.pool = nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2))
        self.conv2 = nn.Conv2d(in_channels=8, out_channels=16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        self.fc1 = nn.Linear(16 * 7 * 7, num_classes)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool(x)
        x = F.relu(self.conv2(x))
        x = self.pool(x)
        x = x.reshape(x.shape[0], -1)
        x = self.fc1(x)
        return x

The CNN class extends the nn.Module class, which is the base class for all neural network modules in PyTorch. We define the layers of our CNN in the __init__ method, and the forward pass operation is implemented in the forward method. Our CNN consists of two convolutional layers, each followed by a max pooling layer. Finally, we have a fully connected layer for classification.

Setting Up the Environment and hyperparameters

Before we proceed, we need to set up some environment configurations. We will define the device on which the network will be trained, whether it is a GPU or CPU. We will also set some hyperparameters for training the network.

# Set device
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# Hyperparameters
input_size = 784
num_classes = 10
learning_rate = 0.001
batch_size = 64
num_epochs = 2

Here, we check for the availability of a GPU and assign the appropriate device. We also specify the hyperparameters, including the input size, number of classes, learning rate, batch size, and number of epochs. The MNIST dataset is loaded using the datasets.MNIST class from torchvision and transformed into tensors.

Loading the Dataset

Next, we will load the MNIST dataset. PyTorch provides a convenient API to download and load common datasets like MNIST.

# Load the dataset
train_dataset = datasets.MNIST(root='datasets/', train=True, transform=transforms.ToTensor(), download=True)
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)

test_dataset = datasets.MNIST(root='datasets/', train=False, transform=transforms.ToTensor(), download=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=True)

We create two Dataset objects for training and testing, specifying the root directory for storing the dataset, whether it is for training or testing, and the transformation to be applied to the data (converting images to tensors). We also create two DataLoader objects, which allow us to iterate over the dataset in batches. The batch_size parameter determines the number of samples to be included in each batch, and shuffle=True shuffles the dataset before each epoch to ensure randomness in the training process.

Initializing the Network, Loss Function, and Optimizer

Now, we will initialize our neural network model, define the loss function, and choose an optimizer for training the network.

# Initialize the network
model = CNN().to(device)
# Loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

We create an instance of our CNN class and move it to the chosen device using the .to(device) method. This ensures that the computations will be performed on either a GPU or CPU, depending on the availability. We use the nn.CrossEntropyLoss() function as our loss function since we are dealing with a classification task. The optim.Adam() function is used to initialize the Adam optimizer, which will update the weights of our model during training.

Training the Network

Now, let's train our CNN by iterating over the dataset for the specified number of epochs, performing forward and backward passes, and updating the model's weights.

# Train the network
for epoch in range(num_epochs):
    for batch_idx, (data, targets) in enumerate(train_loader):
        # Get data to the device
        data = data.to(device)
        targets = targets.to(device)

        # Forward pass
        scores = model(data)
        loss = loss_fn(scores, targets)

        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

Within each epoch, we iterate over the batches in the training data. We move the data and targets to the device. Then, we compute the forward pass by passing the data through our CNN model and calculate the loss. Next, we perform the backward pass to compute the gradients of the loss with respect to the model's parameters. Finally, we update the model's parameters using the optimizer's step() method.

Evaluating the Model

After training the network, we want to evaluate its performance on both the training and testing datasets.

# Check accuracy on both train and test sets
def check_accuracy(loader, model):
    if loader.dataset.train:
        print("Checking accuracy on the train dataset...")
    else:
        print("Checking accuracy on the test dataset...")

    num_correct = 0
    num_samples = 0
    model.eval()

    with torch.no_grad():
        for x, y in loader:
            x = x.to(device)
            y = y.to(device)

            scores = model(x)
            _, preds = scores.max(1)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)

        accuracy = float(num_correct) / float(num_samples) * 100
        print("Accuracy: {:.2f}%".format(accuracy))

    model.train()
    return

# Check accuracy on both train and test datasets
check_accuracy(train_loader, model)
check_accuracy(test_loader, model)

The check_accuracy function takes a data loader and a model as input and calculates the accuracy of the model's predictions on the given dataset. We iterate over the dataset, move the data to the device, reshape it, obtain the predicted scores from the model, calculate the number of correct predictions, and calculate the overall accuracy. The model is set to evaluation mode (model.eval()) to disable gradient computation and speed up inference.

Conclusion

Congratulations! You've successfully coded a simple Convolutional Neural Network using the PyTorch framework. We've covered the steps from defining the network architecture to training and evaluating the model. By understanding this example, you now have a solid foundation to build more complex CNN models and tackle various computer vision tasks.

Feel free to experiment with different architectures, hyperparameters, and datasets to enhance your understanding and create more powerful CNNs. The possibilities are endless!

Happy coding and keep exploring the fascinating world of deep learning with PyTorch! 🚀🔬🤖

References:
https://pytorch.org/
https://pytorch.org/tutorials/
https://www.youtube.com/watch?v=Jy4wM2X21u0&list=PLhhyoLH6IjfxeoooqP9rhU3HJIAVAJ3Vz&index=4&ab_channel=AladdinPersson

Comments