Chapter 6: Model Training Workflow with PyTorch
Abstract:
- Data Loading:Using
torch.utils.data.Datasetto represent your data andtorch.utils.data.DataLoaderto efficiently load and batch it. - Preprocessing:Cleaning, transforming, and augmenting your data as needed (e.g., normalization, resizing images).
 
- Model Definition:Subclassing
torch.nn.Moduleto define the layers and forward pass of your model. - Loss Function:Choosing an appropriate loss function (e.g.,
nn.MSELossfor regression,nn.CrossEntropyLossfor classification) to quantify the difference between predictions and ground truth. - Optimizer:Selecting an optimizer (e.g.,
torch.optim.SGD,torch.optim.Adam) to update the model's parameters based on the calculated loss. 
- Epochs: Iterating through the entire dataset multiple times (epochs).
 - Batches: Processing data in smaller chunks (batches) for efficient computation.
 - Forward Pass: Feeding input data through the model to generate predictions.
 - Loss Calculation: Computing the loss based on predictions and ground truth.
 - Backward Pass (Backpropagation): Calculating gradients of the loss with respect to model parameters.
 - Optimizer Step: Updating model parameters using the chosen optimizer.
 - Zeroing Gradients: Clearing gradients after each optimization step to prevent accumulation.
 
- Inference:Using the trained model to make predictions on unseen data (test or validation set).
 - Evaluation Metrics:Calculating relevant metrics (e.g., accuracy, precision, recall, F1-score for classification; R-squared, MAE for regression) to quantify model performance.
 - Validation Loop:Often, a separate validation loop is run during training to monitor performance and detect overfitting.
 
- Saving Model State: Using 
torch.save()to save the model'sstate_dict(containing learned parameters). - Loading Model State: Loading the saved 
state_dictinto a new model instance. 
Below is the complete Chapter 6: Model Training Workflow PyTorch textbook, structured in a textbook format with clear sections, learning objectives, examples, and exercises.
Chapter 6: Model Training Workflow
Learning Objectives
After completing this chapter, you will be able to:
- 
Understand the essential steps in the model training workflow in PyTorch.
 - 
Design and implement the training loop for neural networks.
 - 
Select and apply appropriate loss functions for various machine learning tasks.
 - 
Use different optimizers such as SGD, Adam, and RMSProp effectively.
 - 
Apply learning rate scheduling to improve training performance.
 - 
Evaluate model performance using common evaluation metrics.
 
6.1 The Training Loop
A training loop is the backbone of model development in PyTorch. It defines how the model learns from data over several epochs (iterations over the full dataset). Each epoch typically involves the following steps:
- 
Forward Pass – The model makes predictions for a batch of input data.
 - 
Loss Computation – The predictions are compared to the ground truth using a loss function.
 - 
Backward Pass – Gradients of the loss with respect to model parameters are computed using backpropagation.
 - 
Parameter Update – The optimizer updates the parameters to reduce the loss.
 - 
Evaluation (optional) – The model is tested on validation data to monitor progress.
 
Typical Structure of a Training Loop
import torch
import torch.nn as nn
import torch.optim as optim
# Example: Simple classification model
model = nn.Linear(10, 2)  # input size 10, output size 2
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)
# Dummy data
inputs = torch.randn(100, 10)
labels = torch.randint(0, 2, (100,))
# Training loop
for epoch in range(10):  # 10 epochs
    optimizer.zero_grad()        # Step 1: Reset gradients
    outputs = model(inputs)      # Step 2: Forward pass
    loss = criterion(outputs, labels)  # Step 3: Compute loss
    loss.backward()              # Step 4: Backward pass
    optimizer.step()             # Step 5: Update parameters
    
    print(f"Epoch [{epoch+1}/10], Loss: {loss.item():.4f}")
Key Points
- 
Always zero out gradients before backpropagation (
optimizer.zero_grad()), as PyTorch accumulates them by default. - 
Loss.backward() computes gradients.
 - 
Optimizer.step() applies the update based on the computed gradients.
 
6.2 Loss Functions in Detail
A loss function (or cost function) quantifies how far the model's predictions are from the actual target values. The goal of training is to minimize this loss.
Common Loss Functions
| Task Type | Loss Function | Description | 
|---|---|---|
| Regression | nn.MSELoss() | 
Mean Squared Error measures average squared difference between predicted and true values. | 
| Regression | nn.L1Loss() | 
Mean Absolute Error measures absolute differences. | 
| Classification | nn.CrossEntropyLoss() | 
Combines LogSoftmax and NLLLoss for multi-class classification. | 
| Binary Classification | nn.BCELoss() or nn.BCEWithLogitsLoss() | 
Used for binary outputs (0 or 1). The latter is numerically more stable. | 
| Custom | User-defined | You can define custom losses using tensor operations. | 
Example: Mean Squared Error Loss
criterion = nn.MSELoss()
predicted = torch.tensor([2.5, 0.0, 2.1])
target = torch.tensor([3.0, -0.5, 2.0])
loss = criterion(predicted, target)
print(loss.item())  # Output: 0.0966
Example: Cross-Entropy Loss
criterion = nn.CrossEntropyLoss()
outputs = torch.tensor([[2.0, 1.0, 0.1]])
labels = torch.tensor([0])
loss = criterion(outputs, labels)
print(loss.item())
Custom Loss Example
def custom_loss(pred, target):
    return torch.mean((pred - target)**2 + 0.1*torch.abs(pred - target))
6.3 Optimizers: SGD, Adam, RMSProp, etc.
An optimizer updates model parameters to minimize loss. It uses the gradients computed during backpropagation to decide how much to adjust each weight.
1. Stochastic Gradient Descent (SGD)
The simplest optimizer. Updates parameters as:
[
\theta = \theta - \eta \cdot \nabla_\theta L
]
where
- 
( \eta ) = learning rate,
 - 
( \nabla_\theta L ) = gradient of the loss with respect to parameter ( \theta ).
 
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
Advantages: Simple, effective for small datasets.
Limitations: Can get stuck in local minima or converge slowly.
2. Adam (Adaptive Moment Estimation)
Adam combines momentum and adaptive learning rate concepts. It computes exponentially moving averages of gradients and squared gradients.
optimizer = optim.Adam(model.parameters(), lr=0.001)
Advantages: Works well for most problems, requires minimal tuning.
Limitations: Can overfit or oscillate if learning rate is not tuned.
3. RMSProp
RMSProp adapts the learning rate for each parameter individually, based on recent gradients.
optimizer = optim.RMSprop(model.parameters(), lr=0.001)
Advantages: Good for recurrent neural networks and noisy gradients.
Limitations: May require careful parameter tuning.
Comparison Summary
| Optimizer | Adaptive LR | Momentum | Typical Use | 
|---|---|---|---|
| SGD | ❌ | ✅ | Basic tasks | 
| Adam | ✅ | ✅ | Default choice for most deep learning tasks | 
| RMSProp | ✅ | ✅ | Recurrent networks, time series | 
6.4 Learning Rate Scheduling
The learning rate (LR) controls the step size in parameter updates. A learning rate scheduler adjusts the LR during training to improve convergence.
1. StepLR
Decreases LR by a factor every few epochs.
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)
2. ExponentialLR
Decreases LR exponentially after each epoch.
scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.95)
3. ReduceLROnPlateau
Reduces LR when a metric stops improving.
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min')
Using a Scheduler
for epoch in range(30):
    # training loop steps
    scheduler.step()
    print(f"Epoch {epoch+1}, LR: {scheduler.get_last_lr()}")
6.5 Evaluation Metrics
After training, models must be evaluated to understand their performance. Metrics differ based on the type of task.
1. Classification Metrics
- 
Accuracy: Ratio of correct predictions.
 - 
Precision: Fraction of true positives among predicted positives.
 - 
Recall: Fraction of true positives among actual positives.
 - 
F1-score: Harmonic mean of precision and recall.
 
Example:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
y_true = [0, 1, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1]
print("Accuracy:", accuracy_score(y_true, y_pred))
print("Precision:", precision_score(y_true, y_pred))
print("Recall:", recall_score(y_true, y_pred))
print("F1 Score:", f1_score(y_true, y_pred))
2. Regression Metrics
- 
Mean Absolute Error (MAE)
 - 
Mean Squared Error (MSE)
 - 
R² Score
 
Example:
from sklearn.metrics import mean_squared_error, r2_score
y_true = [3.0, -0.5, 2.0]
y_pred = [2.5, 0.0, 2.1]
print("MSE:", mean_squared_error(y_true, y_pred))
print("R2 Score:", r2_score(y_true, y_pred))
6.6 Putting It All Together
Below is a complete workflow combining all the discussed concepts:
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.metrics import accuracy_score
# Define model
class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(10, 5)
        self.fc2 = nn.Linear(5, 2)
    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x
# Initialize model, loss, optimizer, scheduler
model = SimpleNet()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.5)
# Dummy data
inputs = torch.randn(100, 10)
labels = torch.randint(0, 2, (100,))
# Training loop
for epoch in range(10):
    optimizer.zero_grad()
    outputs = model(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()
    scheduler.step()
    # Evaluation
    _, preds = torch.max(outputs, 1)
    acc = accuracy_score(labels, preds)
    print(f"Epoch [{epoch+1}/10], Loss: {loss.item():.4f}, Acc: {acc:.4f}")
Summary
- 
The training loop is the heart of model learning in PyTorch.
 - 
Loss functions measure prediction errors and guide optimization.
 - 
Optimizers adjust model parameters to minimize loss efficiently.
 - 
Learning rate schedulers dynamically tune the learning rate for better convergence.
 - 
Evaluation metrics quantify performance and ensure model reliability.
 
Exercises
- 
Write a PyTorch training loop using RMSProp and MSELoss for a regression task.
 - 
Implement a custom loss function that combines MSE and MAE.
 - 
Compare SGD and Adam on a small dataset and observe their convergence behavior.
 - 
Use a StepLR scheduler and plot how the learning rate changes over epochs.
 - 
Implement a function that computes precision, recall, and F1-score for a binary classifier.
 
Comments
Post a Comment
"Thank you for seeking advice on your career journey! Our team is dedicated to providing personalized guidance on education and success. Please share your specific questions or concerns, and we'll assist you in navigating the path to a fulfilling and successful career."