Chapter 3: Automatic Differentiation with Autograd in PyTorch
Abstract:
autograd package provides automatic differentiation for all operations on Tensors, forming the backbone of neural network training in PyTorch. It operates as a define-by-run framework, meaning the backpropagation process is dynamically defined by the execution of your code.autograd works in PyTorch:- Tensors with 
requires_grad=True: To enableautogradto track operations and compute gradients for a specific tensor, you must set itsrequires_gradattribute toTrue. This signals to PyTorch that this tensor is part of a computation for which gradients need to be calculated. 
    import torch    x = torch.tensor(2.0, requires_grad=True)    y = torch.tensor(3.0, requires_grad=True)- Building the Computation Graph: As operations are performed on tensors with 
requires_grad=True,autogradimplicitly constructs a dynamic computation graph. This graph records the sequence of operations and their dependencies. Each operation creates a new node in the graph, storing information necessary for gradient calculation during the backward pass. 
    z = x**2 + y**3  # This operation adds nodes to the computation graph- Backward Pass with 
.backward(): Once the forward pass is complete and a scalar loss value is obtained, calling the.backward()method on this scalar tensor initiates the backward pass.autogradtraverses the computation graph in reverse, applying the chain rule to compute the gradients of the loss with respect to all tensors that haverequires_grad=True. 
    z.backward()  # Computes gradients of z with respect to x and y- Accessing Gradients with 
.grad: After the backward pass, the computed gradients are stored in the.gradattribute of the respective tensors. 
    dz_dx = x.grad  # Gradient of z with respect to x    dz_dy = y.grad  # Gradient of z with respect to y- Dynamic Computation Graph:
autograd's define-by-run nature allows for flexible and dynamic graph construction, enabling control flow statements and varying tensor shapes and operations within each iteration. - Chain Rule:
autogradleverages the chain rule of calculus to efficiently compute gradients through complex computational graphs. - Optimizer Integration:In neural network training, these computed gradients are then used by optimizers (e.g.,
torch.optim.SGD,Adam) to update the model's parameters, minimizing the loss function. torch.no_grad():This context manager can be used to temporarily disable gradient tracking, which is useful during inference or when performing operations that should not contribute to the gradient computation..detach():The.detach()method creates a new tensor that shares the same data as the original but is detached from the computation graph, effectively stopping gradient flow through that point
Let's explore the chapter deeper into Automatic Differentiation with Autograd
So, here’s a complete Chapter 3: Automatic Differentiation with Autograd written in a structured textbook format — with Learning Objectives, Core Concepts, Examples, and Exercises.
Chapter 3: Automatic Differentiation with Autograd
Learning Objectives
By the end of this chapter, you will be able to:
- 
Understand the concept of gradients and their role in optimization.
 - 
Explain how PyTorch’s Autograd system automatically computes derivatives.
 - 
Implement gradient computation for tensors in PyTorch.
 - 
Perform backpropagation to train neural networks.
 - 
Learn how to disable gradient tracking to save computation.
 - 
Apply Autograd through hands-on examples for real understanding.
 
3.1 Understanding Gradients
Gradients are fundamental to machine learning and deep learning. They measure how much a function’s output changes with respect to its inputs.
In simpler terms, gradients tell us the direction and rate of change of a function.
In deep learning, gradients are used to update model parameters (weights and biases) through optimization algorithms like Stochastic Gradient Descent (SGD).
Mathematical Definition
If we have a function ( y = f(x) ),
the gradient of ( y ) with respect to ( x ) is:
[
\frac{dy}{dx}
]
For multivariable functions, the gradient becomes a vector of partial derivatives:
[
\nabla f(x) = \left[\frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, ..., \frac{\partial f}{\partial x_n}\right]
]
In PyTorch, gradients are essential for backpropagation, which allows neural networks to learn by minimizing loss functions.
3.2 The Autograd System in PyTorch
PyTorch provides an automatic differentiation system called Autograd.
It tracks all operations performed on tensors and automatically computes gradients when required.
When you perform operations on tensors with requires_grad=True, PyTorch builds a computation graph that connects operations and variables.
During the backward pass, Autograd traverses this graph to calculate derivatives.
Computation Graph
A computation graph is a directed acyclic graph (DAG) where:
- 
Nodes represent tensors.
 - 
Edges represent operations performed on those tensors.
 
During forward propagation, PyTorch records these operations.
During backward propagation, it uses the chain rule to compute gradients.
Example: Creating a Tensor with Gradient Tracking
import torch
# Create a tensor with gradient tracking enabled
x = torch.tensor(2.0, requires_grad=True)
y = x ** 3 + 4 * x ** 2 + 6 * x + 5
print("y =", y)
Here, PyTorch internally records all operations on x so it can later compute ( \frac{dy}{dx} ).
3.3 Computing Gradients
To compute gradients, we call the .backward() method on the output tensor.
PyTorch then automatically calculates the gradient for all tensors with requires_grad=True.
Example: Computing a Gradient
import torch
x = torch.tensor(2.0, requires_grad=True)
y = 3 * x**2 + 2 * x + 1
# Compute gradient
y.backward()
print("Gradient dy/dx:", x.grad)
Output:
Gradient dy/dx: tensor(14.)
Explanation:
[
y = 3x^2 + 2x + 1 \implies \frac{dy}{dx} = 6x + 2
]
When ( x = 2 ),
[
\frac{dy}{dx} = 6(2) + 2 = 14
]
3.4 Backpropagation in Action
In deep learning, backpropagation is the process of computing gradients of the loss function with respect to model parameters.
Autograd automates this process. You define a loss function, call .backward(), and PyTorch computes all required gradients.
Example: Backpropagation for Multiple Variables
import torch
# Create tensors with requires_grad=True
x = torch.tensor(1.0, requires_grad=True)
y = torch.tensor(2.0, requires_grad=True)
# Define a function
z = x**2 + y**3
# Compute gradients
z.backward()
print("dz/dx:", x.grad)
print("dz/dy:", y.grad)
Output:
dz/dx: tensor(2.)
dz/dy: tensor(12.)
Explanation:
[
z = x^2 + y^3
]
[
\frac{\partial z}{\partial x} = 2x = 2(1) = 2
]
[
\frac{\partial z}{\partial y} = 3y^2 = 3(2)^2 = 12
]
3.5 Disabling Gradient Tracking
Sometimes, we don’t need to track gradients—especially during inference (when evaluating a trained model).
In such cases, disabling gradient tracking reduces memory usage and speeds up computation.
You can disable gradient tracking using:
- 
torch.no_grad()context - 
detach()method 
Example 1: Using torch.no_grad()
x = torch.tensor(3.0, requires_grad=True)
y = x ** 2
# Disable gradient tracking
with torch.no_grad():
    z = y * 2
print(z.requires_grad)  # Output: False
Example 2: Using detach()
x = torch.tensor(3.0, requires_grad=True)
y = x ** 2
z = y.detach()
print(z.requires_grad)  # Output: False
Both methods create a tensor that does not track operations for gradients.
3.6 Hands-on Examples
Let’s explore some real applications of Autograd.
Example 1: Gradient of a Simple Function
x = torch.linspace(-2, 2, 5, requires_grad=True)
y = x**3 - 2 * x + 1
# Compute gradients for multiple values
y.sum().backward()
print("x:", x)
print("Gradients:", x.grad)
Here, we sum y before calling .backward() since it expects a scalar output.
Example 2: Gradient Descent Demonstration
Let’s use Autograd to minimize a simple function ( f(x) = x^2 ).
x = torch.tensor(5.0, requires_grad=True)
learning_rate = 0.1
for step in range(20):
    y = x ** 2
    y.backward()
    
    with torch.no_grad():
        x -= learning_rate * x.grad  # Gradient descent update
        x.grad.zero_()               # Reset gradient after each iteration
    
    print(f"Step {step+1}: x = {x.item():.4f}, y = {y.item():.4f}")
Output (Approximate):
Step 1: x = 4.0000, y = 25.0000
Step 2: x = 3.2000, y = 16.0000
Step 3: x = 2.5600, y = 10.2400
...
Step 20: x ≈ 0.0115, y ≈ 0.0001
The value of x approaches 0 — the minimum point of ( f(x) = x^2 ).
This illustrates how backpropagation and gradient descent work together.
Example 3: Gradient with Non-Scalar Outputs
If the output is not a scalar, you must pass a gradient argument to .backward().
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = x * 2
# Backward with a gradient argument
y.backward(torch.tensor([1.0, 0.1, 0.01]))
print(x.grad)
This computes a weighted gradient across each output element.
Summary
| Concept | Description | 
|---|---|
| Gradient | Measures the rate of change of a function’s output with respect to its input. | 
| Autograd | PyTorch’s system for automatic differentiation. | 
.backward() | 
Computes gradients automatically. | 
.grad | 
Stores the computed gradients for tensors. | 
torch.no_grad() / .detach() | 
Used to disable gradient tracking during inference. | 
Exercises
- 
Basic Gradient Calculation
Create a tensor ( x = 3.0 ) withrequires_grad=Trueand compute the gradient of ( y = 4x^3 + 2x^2 + x ). - 
Vector Function Gradient
Compute gradients for ( y = x_1^2 + 3x_2^3 ) usingtorch.autograd. - 
Gradient Descent Practice
Write a program that minimizes ( f(x) = (x - 5)^2 ) using gradient descent with learning rate = 0.05. - 
No Gradient Tracking
Usetorch.no_grad()to perform inference on a trained model without computing gradients. - 
Challenge
Create a small PyTorch function that takes a scalar input ( x ) and returns both its value and derivative for any polynomial ( ax^3 + bx^2 + cx + d ). 
Comments
Post a Comment
"Thank you for seeking advice on your career journey! Our team is dedicated to providing personalized guidance on education and success. Please share your specific questions or concerns, and we'll assist you in navigating the path to a fulfilling and successful career."