Annexure 9: PyTorch Glossary of Key Terms (Beginner to Advanced)

Abstract:

Below is the Annexure 9: PyTorch Glossary of Key Terms (Beginner to Advanced) — concise, complete, and ready to insert into the book.


**Annexure 9

PyTorch Glossary of Key Terms (Beginner to Advanced)**

This annexure compiles the most essential and frequently used PyTorch terms. It covers foundational concepts, intermediate constructs, and advanced components used in deep learning research and deployment.


A. Beginner-Level Terms

1. Tensor

A multi-dimensional array used for all computations in PyTorch. Analogous to NumPy arrays but with GPU support.

2. Tensor Rank

The number of dimensions (e.g., 0D scalar, 1D vector, 2D matrix).

3. Autograd

PyTorch’s automatic differentiation engine that computes gradients for tensors with requires_grad=True.

4. Computational Graph

A directed graph representing operations performed on tensors. PyTorch builds it dynamically.

5. Gradient

The derivative of a function with respect to its variables; essential for optimization.

6. CUDA

NVIDIA’s GPU computing platform; enables tensor operations to run on GPUs via tensor.cuda().

7. CPU vs GPU

CPU: General-purpose processor.
GPU: Optimized for parallel computations (faster for DL).

8. Optimizer

A PyTorch object that updates model parameters based on gradients (e.g., SGD, Adam).

9. Loss Function

A function that measures the error between predictions and targets. (e.g., MSELoss, CrossEntropyLoss).

10. Model / Network

A class derived from torch.nn.Module that defines layers and forward pass.

11. DataLoader

Loads data in batches, shuffles, and handles multiprocessing.

12. Dataset

Represents input data; can be MNIST, CustomDataset, or others.

13. Epoch

One complete pass over the entire training dataset.

14. Batch Size

Number of samples processed before model update.

15. Learning Rate

Controls how big parameter updates are during training.


B. Intermediate-Level Terms

16. Module

The base class for all neural network components (nn.Module).

17. Forward Pass

Executing the model on input data to get outputs.

18. Backward Pass

PyTorch computes gradients using loss.backward().

19. State_dict

A Python dictionary containing model parameters and optimizer states.

20. Checkpoint

Saved model state used for restoring or resuming training.

21. Activation Function

Introduces non-linearity (e.g., ReLU, Tanh, Sigmoid).

22. Dropout

Regularization technique that randomly zeros activations.

23. Batch Normalization

Normalizes layer inputs for stable training.

24. Gradient Clipping

Restricts gradient magnitude to avoid exploding gradients.

25. Weight Decay

L2 regularization used in optimizers like AdamW.

26. Learning Rate Scheduler

Adjusts learning rate dynamically (StepLR, ReduceLROnPlateau).

27. TorchScript

A PyTorch model representation used for deployment and optimization.

28. ONNX

Open Neural Network Exchange format for cross-platform model deployment.

29. Mixed Precision Training

Training with float16 + float32 for speed and lower memory use.

30. AMP (Automatic Mixed Precision)

PyTorch tool for safe mixed precision via torch.cuda.amp.


C. Advanced-Level Terms

31. DDP (Distributed Data Parallel)

Parallel training across multiple GPUs/machines.

32. RPC Framework

Allows remote execution for model parallel distributed training.

33. JIT Compiler

Just-In-Time compiler accelerating PyTorch models.

34. FX Tracer

PyTorch’s intermediate representation tool for graph transformations.

35. Quantization

Reduces model precision (INT8, FP16) for deployment.

36. Pruning

Removing unimportant weights to reduce model size.

37. Triton

Low-level GPU programming language integrated with PyTorch for kernel customization.

38. Memory Pinning

Pinned memory speeds up CPU → GPU transfers in DataLoader.

39. TensorRT

NVIDIA engine for optimizing PyTorch models for inference.

40. Batch Inference

Running prediction on many inputs simultaneously for faster inference.

41. Micro-Batching

Splitting large batches into smaller ones to avoid OOM errors.

42. Gradient Accumulation

Accumulating gradients over batches to simulate large batch size.

43. Graph Mode Execution

Optimized execution where dynamic graphs are converted to static graphs.

44. Autocast

Automatically selects precision for operations during mixed precision training.

45. Profiler

PyTorch tool to measure performance of CPU/GPU execution.

46. Custom Dataset & Collate Function

User-defined logic for loading data and customizing batch assembly.

47. Parameter Server

Architecture for distributed training used in very large models.

48. Checkpoint Sharding

Splitting model checkpoints across multiple files/devices.

49. Zero Redundancy Optimizer (ZeRO)

Memory-efficient optimizer for training extremely large models.

50. TorchDynamo

A graph-capture tool that speeds up PyTorch execution.

51. Inductor

PyTorch’s native deep-learning compiler backend.

52. Functorch / vmap

Tools for vectorizing operations across inputs.

53. Autograd Grad Mode

Controls gradient tracking (no_grad, inference_mode).

54. Lazy Tensors

Defers execution for optimization in large-scale systems.

55. TorchServe

Framework for serving PyTorch models in production.

56. Accelerate (HuggingFace)

Library simplifying multi-GPU/mixed-precision training.

57. Flash Attention

Optimized attention operation for large transformer models.

58. Memory-Efficient Attention

Techniques reducing memory used by transformer layers.

59. Kernel Fusion

Combining multiple GPU operations into one for speed.

60. PyTorch Lightning

High-level framework simplifying training loops while keeping flexibility.


D. Practical Quick Reference Table

Term Category Short Definition
Tensor Basic Main data structure
Autograd Basic Automatic gradient tracking
DataLoader Basic Batch data iterator
Optimizer Basic Updates model parameters
Scheduler Intermediate Adjusts learning rate
Dropout Intermediate Prevents overfitting
AMP Intermediate Mixed precision tool
DDP Advanced Multi-GPU training
TorchScript Advanced Deployable model format
Quantization Advanced Model size reduction
Profiler Advanced Performance measurement

E. Conclusion

This glossary equips learners, students, and professionals with the most relevant PyTorch terminology. It acts as a rapid reference to reinforce conceptual clarity, improve coding fluency, and support advanced research and deployment workflows.



Comments