Chapter 6: Model Training Workflow with PyTorch

Abstract:

The PyTorch model training workflow typically follows a series of fundamental steps to prepare data, define a model, train it, evaluate its performance, and finally, save and load it for future use.
1. Getting Data Ready:
This initial stage involves preparing your dataset for training. This includes:
  • Data Loading: 
    Using torch.utils.data.Dataset to represent your data and torch.utils.data.DataLoader to efficiently load and batch it.
  • Preprocessing: 
    Cleaning, transforming, and augmenting your data as needed (e.g., normalization, resizing images).
2. Defining and Building a Model:
This step involves creating the neural network architecture that will learn patterns from your data.
  • Model Definition: 
    Subclassing torch.nn.Module to define the layers and forward pass of your model.
  • Loss Function: 
    Choosing an appropriate loss function (e.g., nn.MSELoss for regression, nn.CrossEntropyLoss for classification) to quantify the difference between predictions and ground truth.
  • Optimizer: 
    Selecting an optimizer (e.g., torch.optim.SGDtorch.optim.Adam) to update the model's parameters based on the calculated loss.
3. Fitting the Model to Data (Training Loop):
This is the core of the training process, where the model learns from the training data.
  • Epochs: Iterating through the entire dataset multiple times (epochs).
  • Batches: Processing data in smaller chunks (batches) for efficient computation.
  • Forward Pass: Feeding input data through the model to generate predictions.
  • Loss Calculation: Computing the loss based on predictions and ground truth.
  • Backward Pass (Backpropagation): Calculating gradients of the loss with respect to model parameters.
  • Optimizer Step: Updating model parameters using the chosen optimizer.
  • Zeroing Gradients: Clearing gradients after each optimization step to prevent accumulation.
4. Making Predictions and Evaluating the Model:
After training, the model's performance needs to be assessed.
  • Inference: 
    Using the trained model to make predictions on unseen data (test or validation set).
  • Evaluation Metrics: 
    Calculating relevant metrics (e.g., accuracy, precision, recall, F1-score for classification; R-squared, MAE for regression) to quantify model performance.
  • Validation Loop: 
    Often, a separate validation loop is run during training to monitor performance and detect overfitting.
5. Saving and Loading the Model:
Once trained, the model can be saved for later use or deployment.
  • Saving Model State: Using torch.save() to save the model's state_dict (containing learned parameters).
  • Loading Model State: Loading the saved state_dict into a new model instance.

Below is the complete Chapter 6: Model Training Workflow  PyTorch textbook, structured in a textbook format with clear sections, learning objectives, examples, and exercises.


Chapter 6: Model Training Workflow

Learning Objectives

After completing this chapter, you will be able to:

  • Understand the essential steps in the model training workflow in PyTorch.

  • Design and implement the training loop for neural networks.

  • Select and apply appropriate loss functions for various machine learning tasks.

  • Use different optimizers such as SGD, Adam, and RMSProp effectively.

  • Apply learning rate scheduling to improve training performance.

  • Evaluate model performance using common evaluation metrics.


6.1 The Training Loop

A training loop is the backbone of model development in PyTorch. It defines how the model learns from data over several epochs (iterations over the full dataset). Each epoch typically involves the following steps:

  1. Forward Pass – The model makes predictions for a batch of input data.

  2. Loss Computation – The predictions are compared to the ground truth using a loss function.

  3. Backward Pass – Gradients of the loss with respect to model parameters are computed using backpropagation.

  4. Parameter Update – The optimizer updates the parameters to reduce the loss.

  5. Evaluation (optional) – The model is tested on validation data to monitor progress.

Typical Structure of a Training Loop

import torch
import torch.nn as nn
import torch.optim as optim

# Example: Simple classification model
model = nn.Linear(10, 2)  # input size 10, output size 2
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Dummy data
inputs = torch.randn(100, 10)
labels = torch.randint(0, 2, (100,))

# Training loop
for epoch in range(10):  # 10 epochs
    optimizer.zero_grad()        # Step 1: Reset gradients
    outputs = model(inputs)      # Step 2: Forward pass
    loss = criterion(outputs, labels)  # Step 3: Compute loss
    loss.backward()              # Step 4: Backward pass
    optimizer.step()             # Step 5: Update parameters
    
    print(f"Epoch [{epoch+1}/10], Loss: {loss.item():.4f}")

Key Points

  • Always zero out gradients before backpropagation (optimizer.zero_grad()), as PyTorch accumulates them by default.

  • Loss.backward() computes gradients.

  • Optimizer.step() applies the update based on the computed gradients.


6.2 Loss Functions in Detail

A loss function (or cost function) quantifies how far the model's predictions are from the actual target values. The goal of training is to minimize this loss.

Common Loss Functions

Task Type Loss Function Description
Regression nn.MSELoss() Mean Squared Error measures average squared difference between predicted and true values.
Regression nn.L1Loss() Mean Absolute Error measures absolute differences.
Classification nn.CrossEntropyLoss() Combines LogSoftmax and NLLLoss for multi-class classification.
Binary Classification nn.BCELoss() or nn.BCEWithLogitsLoss() Used for binary outputs (0 or 1). The latter is numerically more stable.
Custom User-defined You can define custom losses using tensor operations.

Example: Mean Squared Error Loss

criterion = nn.MSELoss()
predicted = torch.tensor([2.5, 0.0, 2.1])
target = torch.tensor([3.0, -0.5, 2.0])
loss = criterion(predicted, target)
print(loss.item())  # Output: 0.0966

Example: Cross-Entropy Loss

criterion = nn.CrossEntropyLoss()
outputs = torch.tensor([[2.0, 1.0, 0.1]])
labels = torch.tensor([0])
loss = criterion(outputs, labels)
print(loss.item())

Custom Loss Example

def custom_loss(pred, target):
    return torch.mean((pred - target)**2 + 0.1*torch.abs(pred - target))

6.3 Optimizers: SGD, Adam, RMSProp, etc.

An optimizer updates model parameters to minimize loss. It uses the gradients computed during backpropagation to decide how much to adjust each weight.

1. Stochastic Gradient Descent (SGD)

The simplest optimizer. Updates parameters as:
[
\theta = \theta - \eta \cdot \nabla_\theta L
]
where

  • ( \eta ) = learning rate,

  • ( \nabla_\theta L ) = gradient of the loss with respect to parameter ( \theta ).

optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

Advantages: Simple, effective for small datasets.
Limitations: Can get stuck in local minima or converge slowly.


2. Adam (Adaptive Moment Estimation)

Adam combines momentum and adaptive learning rate concepts. It computes exponentially moving averages of gradients and squared gradients.

optimizer = optim.Adam(model.parameters(), lr=0.001)

Advantages: Works well for most problems, requires minimal tuning.
Limitations: Can overfit or oscillate if learning rate is not tuned.


3. RMSProp

RMSProp adapts the learning rate for each parameter individually, based on recent gradients.

optimizer = optim.RMSprop(model.parameters(), lr=0.001)

Advantages: Good for recurrent neural networks and noisy gradients.
Limitations: May require careful parameter tuning.


Comparison Summary

Optimizer Adaptive LR Momentum Typical Use
SGD Basic tasks
Adam Default choice for most deep learning tasks
RMSProp Recurrent networks, time series

6.4 Learning Rate Scheduling

The learning rate (LR) controls the step size in parameter updates. A learning rate scheduler adjusts the LR during training to improve convergence.

1. StepLR

Decreases LR by a factor every few epochs.

scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)

2. ExponentialLR

Decreases LR exponentially after each epoch.

scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.95)

3. ReduceLROnPlateau

Reduces LR when a metric stops improving.

scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min')

Using a Scheduler

for epoch in range(30):
    # training loop steps
    scheduler.step()
    print(f"Epoch {epoch+1}, LR: {scheduler.get_last_lr()}")

6.5 Evaluation Metrics

After training, models must be evaluated to understand their performance. Metrics differ based on the type of task.

1. Classification Metrics

  • Accuracy: Ratio of correct predictions.

  • Precision: Fraction of true positives among predicted positives.

  • Recall: Fraction of true positives among actual positives.

  • F1-score: Harmonic mean of precision and recall.

Example:

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

y_true = [0, 1, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1]

print("Accuracy:", accuracy_score(y_true, y_pred))
print("Precision:", precision_score(y_true, y_pred))
print("Recall:", recall_score(y_true, y_pred))
print("F1 Score:", f1_score(y_true, y_pred))

2. Regression Metrics

  • Mean Absolute Error (MAE)

  • Mean Squared Error (MSE)

  • R² Score

Example:

from sklearn.metrics import mean_squared_error, r2_score

y_true = [3.0, -0.5, 2.0]
y_pred = [2.5, 0.0, 2.1]

print("MSE:", mean_squared_error(y_true, y_pred))
print("R2 Score:", r2_score(y_true, y_pred))

6.6 Putting It All Together

Below is a complete workflow combining all the discussed concepts:

import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.metrics import accuracy_score

# Define model
class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(10, 5)
        self.fc2 = nn.Linear(5, 2)
    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Initialize model, loss, optimizer, scheduler
model = SimpleNet()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.5)

# Dummy data
inputs = torch.randn(100, 10)
labels = torch.randint(0, 2, (100,))

# Training loop
for epoch in range(10):
    optimizer.zero_grad()
    outputs = model(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()
    scheduler.step()

    # Evaluation
    _, preds = torch.max(outputs, 1)
    acc = accuracy_score(labels, preds)
    print(f"Epoch [{epoch+1}/10], Loss: {loss.item():.4f}, Acc: {acc:.4f}")

Summary

  • The training loop is the heart of model learning in PyTorch.

  • Loss functions measure prediction errors and guide optimization.

  • Optimizers adjust model parameters to minimize loss efficiently.

  • Learning rate schedulers dynamically tune the learning rate for better convergence.

  • Evaluation metrics quantify performance and ensure model reliability.


Exercises

  1. Write a PyTorch training loop using RMSProp and MSELoss for a regression task.

  2. Implement a custom loss function that combines MSE and MAE.

  3. Compare SGD and Adam on a small dataset and observe their convergence behavior.

  4. Use a StepLR scheduler and plot how the learning rate changes over epochs.

  5. Implement a function that computes precision, recall, and F1-score for a binary classifier.

Comments