Appendix E: Common Errors & Debugging Tips in PyTorch

Abstract:
Common errors in PyTorch often stem from issues in tensor manipulation, gradient computation, and device management. Debugging these errors typically involves systematic checks and utilizing PyTorch's built-in tools.
Common Errors:
  • Shape Mismatches: 
    Occur during operations like matrix multiplication, concatenation, or view/reshape operations when tensor dimensions do not align.
    • Debugging Tip: Use tensor.shape or tensor.size() to inspect dimensions at various points in your code.
  • RuntimeError: Trying to backward through the graph a second time
    This happens when attempting to compute gradients for a tensor that has already been freed or has its graph detached. 
    • Debugging Tip: Ensure loss.backward() is called only once per computation graph. If you need to retain the graph for multiple backward calls, use loss.backward(retain_graph=True), but be mindful of memory usage. Alternatively, recompute the relevant operations if possible.
  • inplace operation errors: 
    Result from modifying a tensor in-place that is required for gradient computation.
    • Debugging Tip: Avoid in-place operations (.add_().mul_(), etc.) on tensors that require gradients. Use out-of-place operations or clone() to create a copy before modifying.
  • CUDA Out of Memory (OOM) Errors: 
    Occur when the GPU's memory is exhausted.
    • Debugging Tip: Reduce batch size, use smaller models, or consider techniques like gradient accumulation. Utilize torch.cuda.empty_cache() to clear unused memory.
  • NaN or inf in Gradients/Loss: 
    Indicates numerical instability in the model.
    • Debugging Tip: Check for division by zero, log(0), or extremely large/small values. Use torch.autograd.set_detect_anomaly(True) to pinpoint the operation causing the NaNs. Consider gradient clipping.
  • Incorrect model.train() and model.eval() Usage: 
    Forgetting to switch between training and evaluation modes can lead to unexpected behavior, especially with layers like Dropout and BatchNorm.
    • Debugging Tip: Always call model.train() before training and model.eval() before evaluation/inference.
General Debugging Tips:
  • Start Small: Test with a single batch or a very small dataset to ensure the basic training loop and model forward/backward passes are working.
  • Print Statements: Use print(tensor.shape) and print(tensor.min(), tensor.max()) to inspect tensor values and shapes at critical points.
  • PyTorch Profiler: Use torch.profiler to identify performance bottlenecks and memory usage.
  • torch.autograd.gradcheck: Verify the correctness of custom autograd functions.
  • GPU vs. CPU: Ensure tensors are on the correct device (.to(device)) and that all necessary components (model, data, loss) are consistent in their device placement.
  • Logging: Implement comprehensive logging to track model performance, loss values, and other relevant metrics throughout training

Below is Appendix E: Common Errors & Debugging Tips in PyTorch, written fully, clearly, and professionally for inclusion in PyTorch book.


Appendix E: Common Errors & Debugging Tips in PyTorch

Debugging is an essential part of deep learning development. PyTorch offers flexibility and dynamic graphs, but users often encounter shape mismatches, device errors, gradient issues, and training instability. This appendix covers the most common problems along with practical debugging strategies and recommended best practices.


E.1 Overview of Common PyTorch Errors

Below is a high-level classification of errors frequently encountered:

  1. Tensor shape and dimension mismatches

  2. Device mismatch (CPU vs GPU)

  3. Incorrect use of .item(), .detach(), or .numpy()

  4. Autograd-related issues

  5. DataLoader and batching errors

  6. Model not training or loss not decreasing

  7. Exploding/vanishing gradients

  8. Incorrect model saving or loading

  9. Memory issues (GPU out of memory)

  10. Deprecated or incorrect API usage

Each of these categories is explained with examples and solutions.


E.2 Shape and Dimension Errors

One of the most common obstacles in PyTorch is the dreaded shape mismatch.


E.2.1 Example Error

RuntimeError: Expected input[64, 3, 224, 224] to have the same size as ... 

E.2.2 Common Causes

  • Input size does not match model requirements

  • Incorrect flattening or reshaping

  • Mismatched number of classes

  • Wrong feature map size after convolution layers


E.2.3 Debugging Tips

✔ Print shapes at every step

print(x.shape)

✔ Use torchsummary or torchinfo

from torchinfo import summary
summary(model, input_size=(1, 3, 224, 224))

✔ For linear layers after CNNs

Calculate flatten size:

print(x.view(x.size(0), -1).shape)

✔ Use assert for shape constraints

assert x.ndim == 4, "Input must be [B, C, H, W]"

E.3 Device Mismatch Errors (CPU vs GPU)


E.3.1 Example Error

RuntimeError: Expected all tensors to be on the same device...

E.3.2 Causes

  • Model on GPU but data on CPU

  • Forgetting .to(device)

  • Loss function or labels left on CPU


E.3.3 Fix

Standard pattern:

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

for x, y in dataloader:
    x, y = x.to(device), y.to(device)

E.3.4 Debug Tip

Inspect devices:

print(x.device, model.fc.weight.device)

E.4 Issues with .item(), .detach(), .numpy()


E.4.1 Common Mistakes

❌ Calling .numpy() on GPU tensors

TypeError: can't convert CUDA tensor to numpy. Use Tensor.cpu() first.

❌ Detaching when you need gradients

loss.backward()  # fails silently if graph was detached

❌ Using .item() on non-scalar tensors


E.4.2 Correct Usage

✔ Convert GPU tensor to NumPy

x.cpu().detach().numpy()

✔ Use .item() only for scalar values

loss_value = loss.item()

✔ For inference without gradients

with torch.no_grad():
    output = model(x)

E.5 Autograd Errors


E.5.1 Example

RuntimeError: element 0 of tensors does not require grad

Common Causes:

  • You accidentally used .detach()

  • You performed operations inside a with torch.no_grad() block

  • Model parameters were not registered correctly


E.5.2 Debug Tips

✔ Check if parameters require grad

for name, param in model.named_parameters():
    print(name, param.requires_grad)

✔ Ensure layers are assigned as class attributes

# WRONG — not registered as a layer
layer = nn.Linear(10, 5)

# RIGHT
self.layer = nn.Linear(10, 5)

E.6 Dataloader & Batch Errors


E.6.1 Common Problems

  • Wrong dataset return format

  • Labels not being integers

  • Incorrect transforms

  • Collate function errors

  • Batch dimension missing


E.6.2 Debug Tips

✔ Ensure dataset returns (input, label)

img, label = train_dataset[0]

✔ Confirm label types

print(type(label), label)

✔ Check batch shape

for batch in dataloader:
    x, y = batch
    print(x.shape, y.shape)
    break

✔ If using custom collate_fn

Test it manually with 2–3 items.


E.7 Model Not Training / Loss Not Decreasing


E.7.1 Causes

  • Wrong learning rate

  • Bad weight initialization

  • Incorrect loss function

  • Normalization missing

  • Gradients exploding or vanishing

  • Model too simple

  • Data labeling errors

  • Using softmax + CrossEntropyLoss (double softmax)


E.7.2 Quick Fix Checklist

✔ Use nn.CrossEntropyLoss() without softmax
✔ Try lower learning rate (1e–3 → 1e–4)
✔ Check if model outputs correct shape
✔ Print some predictions
✔ Visualize data samples
✔ Verify labels and preprocessing
✔ Ensure shuffle=True in training DataLoader
✔ Use gradient clipping:

torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

E.8 Exploding / Vanishing Gradients


E.8.1 Symptoms

  • Loss becomes NaN

  • Model diverges

  • Accuracy stuck

  • Gradients extremely large or small


E.8.2 Debug Tips

✔ Check gradient stats

for p in model.parameters():
    print(p.grad.norm())

✔ Apply gradient clipping

clip_grad_norm_(model.parameters(), 5)

✔ Use better initialization

nn.init.xavier_uniform_(layer.weight)

✔ Use normalized activations

BatchNorm, LayerNorm


E.9 Model Saving & Loading Errors


E.9.1 Saver Errors

AttributeError: can't pickle local object

Fix

Define model classes at top level, not inside functions.


E.9.2 Loading Errors

RuntimeError: size mismatch for layer.weight...

Fix

Load only weights:

model.load_state_dict(torch.load("model.pth"), strict=False)

E.9.3 Correct Save/Load Pattern

Save:

torch.save(model.state_dict(), "model.pth")

Load:

model = MyModel()
model.load_state_dict(torch.load("model.pth"))
model.eval()

E.10 Out-of-Memory (OOM) Errors


E.10.1 Common Causes

  • Batch size too large

  • Too many workers in DataLoader

  • No with torch.no_grad() during evaluation

  • Storing tensors accidentally


E.10.2 Solutions

✔ Reduce batch size

✔ Use mixed precision (AMP)

with torch.cuda.amp.autocast():
    output = model(x)

✔ Clear cache

torch.cuda.empty_cache()

✔ Don’t store tensors in lists

Use .detach() if needed.


E.11 Deprecated API Errors

PyTorch evolves rapidly; older tutorials may use deprecated functions.


Examples:

Deprecated Updated
Variable() Just use tensors
F.sigmoid torch.sigmoid
view(-1) for flatten nn.Flatten()
volatile=True Use torch.no_grad()

E.12 Debugging Tools in PyTorch


E.12.1 torch.autograd.set_detect_anomaly(True)

Useful for tracking the source of NaN or invalid backward passes.

torch.autograd.set_detect_anomaly(True)

E.12.2 torchviz (for graph visualization)

pip install torchviz

E.12.3 TensorBoard

Monitor loss, gradients, histograms:

tensorboard --logdir runs

E.12.4 pdb (Python Debugger)

Insert at any point:

import pdb; pdb.set_trace()

E.13 Summary

This appendix covered the most common PyTorch issues, including:

  • Shape and dimension mismatches

  • Device and tensor type errors

  • Autograd and gradient problems

  • DataLoader and batching issues

  • Loss not decreasing or unstable training

  • Model saving/loading challenges

  • GPU memory limitations

  • Deprecated API pitfalls

By combining debugging strategies, assertions, visualization, and PyTorch’s built-in tools, developers can resolve errors quickly and maintain cleaner, more reliable deep learning code.



Comments