Chapter 4: Building Neural Networks with PyTorch

Abstract:

Building neural networks with PyTorch typically involves defining a model, preparing data, and then training the model.

1. Defining the Neural Network Model:
  • Inherit from nn.Module
    Create a class for your neural network that inherits from torch.nn.Module. This provides essential functionality for managing layers and parameters.
  • Initialize Layers in __init__
    Define the individual layers of your network (e.g., nn.Linear for fully connected layers, nn.Conv2d for convolutional layers, nn.ReLU for activation functions) within the __init__ method.
  • Define Forward Pass in forward
    Implement the forward method, which dictates how data flows through the defined layers to produce an output.
Example of a simple feedforward network:
Python
import torchfrom torch import nnclass SimpleNeuralNetwork(nn.Module):    def __init__(self):        super().__init__()        self.flatten = nn.Flatten()        self.linear_relu_stack = nn.Sequential(            nn.Linear(28*28, 512),            nn.ReLU(),            nn.Linear(512, 512),            nn.ReLU(),            nn.Linear(512, 10)        )    def forward(self, x):        x = self.flatten(x)        logits = self.linear_relu_stack(x)        return logits
2. Data Preparation:
  • Load and Transform Data
    Load your dataset (e.g., using torchvision.datasets for image data) and apply necessary transformations (e.g., torchvision.transforms to resize, normalize, or convert to tensors).
  • Create DataLoaders
    Wrap your datasets in torch.utils.data.DataLoader for efficient batching and iteration during training.
3. Training the Neural Network:
  • Instantiate the Model
    Create an instance of your defined neural network. Move it to a GPU if available using .to(device).
  • Define Loss Function
    Choose an appropriate loss function (e.g., nn.CrossEntropyLoss for classification, nn.MSELoss for regression).
  • Define Optimizer
    Select an optimizer (e.g., torch.optim.SGDtorch.optim.Adam) and link it to your model's parameters.
  • Training Loop:
    • Iterate through your DataLoader for a specified number of epochs.
    • For each batch:
      • Perform a forward pass through the model to get predictions.
      • Calculate the loss using the predictions and true labels.
      • Perform backpropagation by calling loss.backward() to compute gradients.
      • Update model parameters using optimizer.step().
      • Zero out gradients using optimizer.zero_grad().
  • Evaluation
    Periodically evaluate the model's performance on a validation or test set to monitor progress and prevent overfitting

Here’s a complete Chapter 4 written in a textbook style (with clear sections, explanations, examples, and exercises


Learning Objectives

After completing this chapter, you will be able to:

  • Understand how neural networks are structured and represented in PyTorch.

  • Use the torch.nn module to create and manage neural network layers.

  • Apply activation and loss functions effectively.

  • Implement forward and backward passes manually and automatically.

  • Initialize network parameters and train a simple feedforward neural network using real data.


4.1 The torch.nn Module

The torch.nn module is one of the core components of PyTorch that simplifies the creation and training of neural networks. It provides classes and functions for building layers, defining activation functions, calculating loss, and managing model parameters.

The main abstraction in torch.nn is the nn.Module class. Every neural network you define in PyTorch should inherit from this class.

Key Features of torch.nn

  • Encapsulates layers and operations in reusable modules.

  • Automatically registers parameters (weights, biases, etc.).

  • Defines a forward() method for computation.

  • Works seamlessly with the torch.optim package for optimization.

Example: Basic nn.Module Structure

import torch
import torch.nn as nn

class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.linear = nn.Linear(10, 2)  # 10 input features, 2 output features

    def forward(self, x):
        return self.linear(x)

model = SimpleModel()
print(model)

Output:

SimpleModel(
  (linear): Linear(in_features=10, out_features=2, bias=True)
)

Here, the model has one linear layer. The weights and biases are automatically registered and will be optimized during training.


4.2 Layers, Activation Functions, and Loss Functions

(a) Layers

Layers are the building blocks of neural networks. They transform inputs into outputs using learned weights.

Some common layer types:

  • nn.Linear — Fully connected layer.

  • nn.Conv2d — 2D convolutional layer for images.

  • nn.RNN, nn.LSTM — Recurrent layers for sequential data.

  • nn.BatchNorm2d — Batch normalization for stabilizing training.

Example:

layer = nn.Linear(4, 3)  # Input size = 4, Output size = 3
x = torch.randn(1, 4)
y = layer(x)
print(y)

(b) Activation Functions

Activation functions introduce non-linearity to neural networks, enabling them to learn complex mappings.

Common activation functions in PyTorch:

Activation Function Class/Function Description
Sigmoid nn.Sigmoid() or torch.sigmoid() Squashes input to range (0,1)
ReLU nn.ReLU() or F.relu() Sets negative inputs to zero
Tanh nn.Tanh() or torch.tanh() Maps input to (-1,1)
LeakyReLU nn.LeakyReLU() Avoids dying ReLU problem

Example:

activation = nn.ReLU()
input_data = torch.tensor([-1.0, 0.0, 2.0])
output_data = activation(input_data)
print(output_data)

Output:

tensor([0., 0., 2.])

(c) Loss Functions

Loss functions measure the difference between predictions and target values. They guide the optimization process.

Common loss functions:

Task Type Loss Function PyTorch Class
Regression Mean Squared Error nn.MSELoss()
Binary Classification Binary Cross Entropy nn.BCELoss()
Multi-class Classification Cross Entropy nn.CrossEntropyLoss()

Example:

criterion = nn.MSELoss()
pred = torch.tensor([0.5, 0.8, 0.1])
target = torch.tensor([1.0, 0.0, 0.0])
loss = criterion(pred, target)
print(loss)

4.3 Forward and Backward Passes

Forward Pass

The forward pass involves feeding inputs through the model to compute predictions.

output = model(input_data)

Internally, this calls the model’s forward() method.

Backward Pass

The backward pass computes gradients for each parameter with respect to the loss function. PyTorch uses automatic differentiation to do this.

loss.backward()  # Computes gradients

Example:

x = torch.randn(1, 10)
y = torch.tensor([[1.0, 0.0]])

criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Forward pass
output = model(x)
loss = criterion(output, y)

# Backward pass
optimizer.zero_grad()  # Clear previous gradients
loss.backward()        # Compute new gradients
optimizer.step()       # Update parameters

4.4 Model Initialization and Parameters

Every nn.Module stores its learnable parameters in model.parameters(). You can view or modify them directly.

Accessing Parameters

for name, param in model.named_parameters():
    print(name, param.size())

Custom Initialization

You can use the torch.nn.init module to initialize weights and biases manually.

Example:

import torch.nn.init as init

for name, param in model.named_parameters():
    if 'weight' in name:
        init.xavier_uniform_(param)
    elif 'bias' in name:
        init.zeros_(param)

This initializes weights using the Xavier Uniform method, which helps maintain balanced gradients during training.


4.5 Practical Example: A Simple Feedforward Neural Network

Let’s build and train a simple Feedforward Neural Network (FNN) for binary classification using synthetic data.

Step 1: Import Libraries

import torch
import torch.nn as nn
import torch.optim as optim

Step 2: Define the Model

class FeedforwardNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(FeedforwardNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, output_size)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        out = self.sigmoid(out)
        return out

Step 3: Prepare Data

# Example input: 4 features per sample
X = torch.randn(100, 4)
y = torch.randint(0, 2, (100, 1)).float()

Step 4: Initialize Model, Loss, and Optimizer

model = FeedforwardNN(4, 8, 1)
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

Step 5: Training Loop

num_epochs = 100

for epoch in range(num_epochs):
    # Forward pass
    outputs = model(X)
    loss = criterion(outputs, y)
    
    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

Step 6: Model Evaluation

with torch.no_grad():
    predicted = (model(X) > 0.5).float()
    accuracy = (predicted == y).sum() / y.size(0)
    print(f'Accuracy: {accuracy:.2f}')

4.6 Summary

  • The torch.nn module simplifies network construction and training.

  • Layers, activation functions, and loss functions are the core components of any neural network.

  • Forward and backward passes form the basis of learning in PyTorch.

  • Proper parameter initialization improves convergence.

  • A feedforward neural network can be easily built and trained using just a few lines of PyTorch code.


4.7 Exercises

  1. Conceptual Questions

    1. What is the purpose of the forward() method in a PyTorch model?

    2. Explain the difference between nn.ReLU() and F.relu().

    3. Why is weight initialization important in neural networks?

    4. Describe the role of a loss function during training.

    5. How does PyTorch compute gradients automatically?

  2. Programming Tasks

    1. Modify the feedforward neural network to include two hidden layers.

    2. Use nn.Tanh() as an activation function and compare the results with ReLU.

    3. Implement a regression model using nn.MSELoss() and synthetic data.

    4. Visualize the loss curve over epochs using matplotlib.

    5. Experiment with different optimizers (SGD, Adam, RMSprop) and observe their performance.

Comments