Chapter 18: Debugging and Visualization with PyTorch

Abstract:

Debugging and visualization are crucial for developing and optimizing PyTorch models. Several tools and techniques facilitate these processes:
Debugging:
  • Standard Python Debuggers: 
    Integrated Development Environments (IDEs) like VS Code or PyCharm offer robust Python debugging capabilities. This includes setting breakpoints, stepping through code, inspecting variables, and evaluating expressions. To debug into PyTorch source code, the justMyCode setting in the Python configuration might need to be set to false.
  • Printing and Logging: 
    Simple print() statements or logging libraries can be used to inspect tensor values, shapes, and other relevant information at different stages of the model's execution.
  • PyTorch Hooks: 
    Forward and backward hooks can be registered on modules or tensors to inspect and even modify activations or gradients during the forward and backward passes. This is particularly useful for understanding gradient flow and identifying issues like vanishing or exploding gradients.
  • CommDebugMode: 
    For distributed training, CommDebugMode helps pinpoint collective communication operations and their origin within the model, aiding in debugging distributed issues.
Visualization:
  • TensorBoard: 
    A powerful visualization tool for tracking metrics, visualizing model graphs, inspecting activations and weights, and analyzing performance over training runs. PyTorch integrates well with TensorBoard through torch.utils.tensorboard.SummaryWriter.
  • Netron: 
    A viewer for neural network models, supporting various formats including PyTorch's ONNX export. It provides an interactive visualization of the model's architecture, including layers, connections, and input/output shapes.
  • Torchviz: 
    A library for visualizing PyTorch computation graphs, providing a visual representation of the data flow and operations within a model.
  • Weights & Biases (W&B): 
    A platform for experiment tracking, visualization, and collaboration, offering comprehensive tools for logging metrics, visualizing model performance, and debugging.
  • Debugging Image Viewer (e.g., in PyCharm): 
    Plugins like the Debug Image Viewer in PyCharm allow direct visualization of PyTorch tensors as images during debugging, which is beneficial for computer vision tasks.
  • Memory Snapshot Tool: 
    For GPU memory debugging, PyTorch's Memory Snapshot tool provides a detailed visualization of GPU memory allocations over time, helping identify memory leaks or inefficient memory usage


Below is a complete Chapter 18: Debugging and Visualization for Deep Learning with PyTorch textbook, written in structured academic style with learning objectives, detailed explanations, examples, diagrams (described), and exercises.


Chapter 18: Debugging and Visualization

Learning Objectives

After completing this chapter, you will be able to:

  • Identify common issues encountered during training and testing deep learning models in PyTorch.

  • Apply systematic debugging techniques to diagnose and fix problems in model architecture, data, and optimization.

  • Use TensorBoard to visualize model performance, losses, and computational graphs.

  • Analyze gradients and weights to understand how the model learns and to detect potential training issues such as vanishing/exploding gradients.


18.1 Debugging Techniques in PyTorch

Even with well-structured code, debugging deep learning models can be challenging. Unlike traditional software bugs, neural network “bugs” often manifest as subtle numerical issues—such as gradients becoming NaN, model not learning, or loss not converging.

Common Sources of Errors

  1. Shape Mismatch:
    The most frequent source of runtime errors in PyTorch.
    Example:

    logits = model(inputs)  # Output: [batch_size, 10]
    loss = criterion(logits, labels)  # labels: [batch_size, 1]
    

    Here, labels should be of shape [batch_size] for nn.CrossEntropyLoss.

  2. Incorrect Device Usage:
    Mixing CPU and GPU tensors causes RuntimeError: Expected all tensors to be on the same device.
    Always use:

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)
    inputs, labels = inputs.to(device), labels.to(device)
    
  3. Learning Rate Issues:

    • Too high → loss diverges, gradients explode.

    • Too low → model learns too slowly or appears stuck.

  4. Improper Data Normalization:
    Neural networks require normalized data for stable training.
    For images:

    transforms.Normalize(mean=[0.5], std=[0.5])
    

Step-by-Step Debugging Strategy

  1. Use Print Statements Judiciously
    Print intermediate shapes and values:

    print(inputs.shape, outputs.shape, loss.item())
    

    or verify specific tensor ranges:

    print(torch.min(outputs), torch.max(outputs))
    
  2. Check Gradient Flow
    Ensure gradients are not zero or exploding:

    for name, param in model.named_parameters():
        if param.grad is not None:
            print(name, param.grad.abs().mean())
    
  3. Run Small Data Batches
    Train on a few samples to ensure your model can overfit a tiny dataset.
    If not, there is likely a bug in model, loss, or optimizer.

  4. Use torch.autograd.set_detect_anomaly(True)
    This identifies problematic operations during backpropagation:

    torch.autograd.set_detect_anomaly(True)
    loss.backward()
    
  5. Use Gradient Clipping
    If gradients explode:

    torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
    
  6. Check Loss Function Compatibility
    Ensure correct pairing:

    • nn.CrossEntropyLoss → raw logits (no softmax)

    • nn.BCEWithLogitsLoss → binary/multi-label raw logits

  7. Log Intermediate Metrics
    Use tools like TensorBoard (next section) to monitor:

    • Training/validation loss

    • Accuracy

    • Gradient magnitudes


Example: Debugging a Simple Classifier

import torch
import torch.nn as nn
import torch.optim as optim

# Model Definition
class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(784, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = x.view(-1, 784)
        x = torch.relu(self.fc1(x))
        return self.fc2(x)

# Initialize
model = SimpleNet()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Debugging Loss Explosion
for epoch in range(5):
    inputs = torch.randn(64, 1, 28, 28)
    labels = torch.randint(0, 10, (64,))
    
    outputs = model(inputs)
    loss = criterion(outputs, labels)

    optimizer.zero_grad()
    torch.autograd.set_detect_anomaly(True)
    loss.backward()
    
    # Print gradient statistics
    for name, param in model.named_parameters():
        print(f"{name} grad mean: {param.grad.abs().mean():.6f}")
    
    optimizer.step()
    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

18.2 Visualizing Neural Networks with TensorBoard

Visualization is crucial to understand how your model learns over time. TensorBoard, originally from TensorFlow, is fully compatible with PyTorch via torch.utils.tensorboard.

Setting Up TensorBoard

Install:

pip install tensorboard

Import:

from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter("runs/experiment1")

Visualizing Training Metrics

for epoch in range(10):
    running_loss = 0.0
    for i, (inputs, labels) in enumerate(trainloader):
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

    writer.add_scalar('Training Loss', running_loss / len(trainloader), epoch)
    print(f"Epoch [{epoch+1}] Loss: {running_loss/len(trainloader):.4f}")

Start TensorBoard:

tensorboard --logdir=runs

Then visit: http://localhost:6006/

You’ll see:

  • Scalars: loss curves, accuracy

  • Graphs: computational graph visualization

  • Histograms: weights and gradients

  • Images: sample visualizations from dataset


Visualizing the Model Graph

sample_input = torch.randn(1, 1, 28, 28)
writer.add_graph(model, sample_input)

TensorBoard will render the network architecture, showing layer connections and tensor flow.


Visualizing Images and Predictions

import torchvision
images, labels = next(iter(trainloader))
img_grid = torchvision.utils.make_grid(images)
writer.add_image('MNIST_Images', img_grid)

You can also add prediction images for qualitative monitoring:

writer.add_images('Predictions', predicted_images)

Histogram Visualization

Tracking distributions of model parameters and gradients helps identify training stability.

for name, param in model.named_parameters():
    writer.add_histogram(name, param, epoch)
    writer.add_histogram(f"{name}.grad", param.grad, epoch)

Histogram analysis helps detect:

  • Vanishing gradients (values near zero)

  • Exploding gradients (extreme peaks)

  • Dead neurons (inactive weights)


18.3 Gradient and Weight Analysis

Understanding Gradient Flow

During backpropagation, gradients propagate backward from output to earlier layers.
Monitoring gradient statistics can reveal problems such as:

Problem Symptom Solution
Vanishing Gradients Gradients close to 0 Use ReLU activation, batch normalization
Exploding Gradients Very large gradients Apply gradient clipping
Dead Neurons Constant zero outputs Reduce learning rate or reinitialize weights

Gradient Norm Monitoring

total_norm = 0
for p in model.parameters():
    if p.grad is not None:
        param_norm = p.grad.data.norm(2)
        total_norm += param_norm.item() ** 2
total_norm = total_norm ** 0.5
print(f"Total Gradient Norm: {total_norm:.4f}")

This helps ensure that gradient magnitudes remain stable.


Weight Distribution Analysis

Weights can also be visualized using TensorBoard histograms to check for:

  • Weight saturation (many near-zero values)

  • Divergence (too large magnitudes)

for name, param in model.named_parameters():
    writer.add_histogram(f"Weights/{name}", param, epoch)

Interpretation:

  • Smooth bell-shaped curves → stable learning

  • Wide or spiky histograms → instability, possibly too high learning rate


Gradient Vanishing Example

Consider a deep network using sigmoid activation:

x = torch.randn(32, 100)
for layer in model.children():
    x = torch.sigmoid(layer(x))

Here, repeated sigmoid activations can squash gradients to near-zero values.
Switching to ReLU mitigates this:

x = torch.relu(layer(x))

Example: Visualizing Weights and Gradients in TensorBoard

for epoch in range(10):
    for inputs, labels in trainloader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

    # Add to TensorBoard
    for name, param in model.named_parameters():
        writer.add_histogram(f"Weights/{name}", param, epoch)
        writer.add_histogram(f"Gradients/{name}", param.grad, epoch)

This produces histograms showing how weights and gradients evolve.


18.4 Summary

  • Debugging PyTorch models involves checking tensor shapes, gradients, learning rate, and loss consistency.

  • Tools like torch.autograd.set_detect_anomaly(True) help trace problematic operations during backpropagation.

  • TensorBoard provides an intuitive way to visualize model training, architecture, gradients, and weight distributions.

  • Gradient and weight analysis are essential for diagnosing vanishing or exploding gradients and ensuring stable convergence.


18.5 Exercises

Short Answer Questions

  1. What are the most common causes of loss divergence during training?

  2. How can TensorBoard assist in understanding model behavior?

  3. What does torch.autograd.set_detect_anomaly(True) do?

  4. Explain the difference between vanishing and exploding gradients.

  5. How can you visualize model weights in TensorBoard?

Hands-On Tasks

  1. Train a small CNN on the MNIST dataset and use TensorBoard to visualize loss and accuracy over epochs.

  2. Add gradient histograms to your TensorBoard logs and interpret how they change during training.

  3. Intentionally introduce a tensor shape mismatch error and debug it using print statements.

  4. Use torch.nn.utils.clip_grad_norm_ to prevent exploding gradients and observe its effect.

  5. Compare sigmoid and ReLU activations in a deep network and record the differences in gradient magnitudes.

Comments