Abstract:

PyTorch's autograd package provides automatic differentiation for all operations on Tensors, forming the backbone of neural network training in PyTorch. It operates as a define-by-run framework, meaning the backpropagation process is dynamically defined by the execution of your code.

Here's how automatic differentiation with autograd works in PyTorch:

Tensors with requires_grad=True: To enable autograd to track operations and compute gradients for a specific tensor, you must set its requires_grad attribute to True. This signals to PyTorch that this tensor is part of a computation for which gradients need to be calculated.

Python

    import torch

    x = torch.tensor(2.0, requires_grad=True)
    y = torch.tensor(3.0, requires_grad=True)

Building the Computation Graph: As operations are performed on tensors with requires_grad=True, autograd implicitly constructs a dynamic computation graph. This graph records the sequence of operations and their dependencies. Each operation creates a new node in the graph, storing information necessary for gradient calculation during the backward pass.

Python

    z = x**2 + y**3  # This operation adds nodes to the computation graph

Backward Pass with .backward(): Once the forward pass is complete and a scalar loss value is obtained, calling the .backward() method on this scalar tensor initiates the backward pass. autograd traverses the computation graph in reverse, applying the chain rule to compute the gradients of the loss with respect to all tensors that have requires_grad=True.

Python

    z.backward()  # Computes gradients of z with respect to x and y

Accessing Gradients with .grad: After the backward pass, the computed gradients are stored in the .grad attribute of the respective tensors.

Python

    dz_dx = x.grad  # Gradient of z with respect to x
    dz_dy = y.grad  # Gradient of z with respect to y

Key Concepts:

Dynamic Computation Graph:
autograd's define-by-run nature allows for flexible and dynamic graph construction, enabling control flow statements and varying tensor shapes and operations within each iteration.
Chain Rule:
autograd leverages the chain rule of calculus to efficiently compute gradients through complex computational graphs.
Optimizer Integration:
In neural network training, these computed gradients are then used by optimizers (e.g., torch.optim.SGD, Adam) to update the model's parameters, minimizing the loss function.
torch.no_grad():
This context manager can be used to temporarily disable gradient tracking, which is useful during inference or when performing operations that should not contribute to the gradient computation.
.detach():
The .detach() method creates a new tensor that shares the same data as the original but is detached from the computation graph, effectively stopping gradient flow through that point

Let's explore the chapter deeper into Automatic Differentiation with Autograd

So, here’s a complete Chapter 3: Automatic Differentiation with Autograd written in a structured textbook format — with Learning Objectives, Core Concepts, Examples, and Exercises.

Chapter 3: Automatic Differentiation with Autograd

Learning Objectives

By the end of this chapter, you will be able to:

Understand the concept of gradients and their role in optimization.
Explain how PyTorch’s Autograd system automatically computes derivatives.
Implement gradient computation for tensors in PyTorch.
Perform backpropagation to train neural networks.
Learn how to disable gradient tracking to save computation.
Apply Autograd through hands-on examples for real understanding.

3.1 Understanding Gradients

Gradients are fundamental to machine learning and deep learning. They measure how much a function’s output changes with respect to its inputs.
In simpler terms, gradients tell us the direction and rate of change of a function.

In deep learning, gradients are used to update model parameters (weights and biases) through optimization algorithms like Stochastic Gradient Descent (SGD).

Mathematical Definition

If we have a function ( y = f(x) ),
the gradient of ( y ) with respect to ( x ) is:

[
\frac{dy}{dx}
]

For multivariable functions, the gradient becomes a vector of partial derivatives:

[
\nabla f(x) = \left[\frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, ..., \frac{\partial f}{\partial x_n}\right]
]

In PyTorch, gradients are essential for backpropagation, which allows neural networks to learn by minimizing loss functions.

3.2 The Autograd System in PyTorch

PyTorch provides an automatic differentiation system called Autograd.
It tracks all operations performed on tensors and automatically computes gradients when required.

When you perform operations on tensors with requires_grad=True, PyTorch builds a computation graph that connects operations and variables.
During the backward pass, Autograd traverses this graph to calculate derivatives.

Computation Graph

A computation graph is a directed acyclic graph (DAG) where:

Nodes represent tensors.
Edges represent operations performed on those tensors.

During forward propagation, PyTorch records these operations.
During backward propagation, it uses the chain rule to compute gradients.

Example: Creating a Tensor with Gradient Tracking

import torch

# Create a tensor with gradient tracking enabled
x = torch.tensor(2.0, requires_grad=True)
y = x ** 3 + 4 * x ** 2 + 6 * x + 5

print("y =", y)

Here, PyTorch internally records all operations on x so it can later compute ( \frac{dy}{dx} ).

3.3 Computing Gradients

To compute gradients, we call the .backward() method on the output tensor.
PyTorch then automatically calculates the gradient for all tensors with requires_grad=True.

Example: Computing a Gradient

import torch

x = torch.tensor(2.0, requires_grad=True)
y = 3 * x**2 + 2 * x + 1

# Compute gradient
y.backward()

print("Gradient dy/dx:", x.grad)

Output:

Gradient dy/dx: tensor(14.)

Explanation:

[
y = 3x^2 + 2x + 1 \implies \frac{dy}{dx} = 6x + 2
]
When ( x = 2 ),
[
\frac{dy}{dx} = 6(2) + 2 = 14
]

3.4 Backpropagation in Action

In deep learning, backpropagation is the process of computing gradients of the loss function with respect to model parameters.

Autograd automates this process. You define a loss function, call .backward(), and PyTorch computes all required gradients.

Example: Backpropagation for Multiple Variables

import torch

# Create tensors with requires_grad=True
x = torch.tensor(1.0, requires_grad=True)
y = torch.tensor(2.0, requires_grad=True)

# Define a function
z = x**2 + y**3

# Compute gradients
z.backward()

print("dz/dx:", x.grad)
print("dz/dy:", y.grad)

Output:

dz/dx: tensor(2.)
dz/dy: tensor(12.)

Explanation:

[
z = x^2 + y^3
]
[
\frac{\partial z}{\partial x} = 2x = 2(1) = 2
]
[
\frac{\partial z}{\partial y} = 3y^2 = 3(2)^2 = 12
]

3.5 Disabling Gradient Tracking

Sometimes, we don’t need to track gradients—especially during inference (when evaluating a trained model).
In such cases, disabling gradient tracking reduces memory usage and speeds up computation.

You can disable gradient tracking using:

torch.no_grad() context
detach() method

Example 1: Using `torch.no_grad()`

x = torch.tensor(3.0, requires_grad=True)
y = x ** 2

# Disable gradient tracking
with torch.no_grad():
    z = y * 2
print(z.requires_grad)  # Output: False

Example 2: Using `detach()`

x = torch.tensor(3.0, requires_grad=True)
y = x ** 2
z = y.detach()
print(z.requires_grad)  # Output: False

Both methods create a tensor that does not track operations for gradients.

3.6 Hands-on Examples

Let’s explore some real applications of Autograd.

Example 1: Gradient of a Simple Function

x = torch.linspace(-2, 2, 5, requires_grad=True)
y = x**3 - 2 * x + 1

# Compute gradients for multiple values
y.sum().backward()

print("x:", x)
print("Gradients:", x.grad)

Here, we sum y before calling .backward() since it expects a scalar output.

Example 2: Gradient Descent Demonstration

Let’s use Autograd to minimize a simple function ( f(x) = x^2 ).

x = torch.tensor(5.0, requires_grad=True)

learning_rate = 0.1

for step in range(20):
    y = x ** 2
    y.backward()
    
    with torch.no_grad():
        x -= learning_rate * x.grad  # Gradient descent update
        x.grad.zero_()               # Reset gradient after each iteration
    
    print(f"Step {step+1}: x = {x.item():.4f}, y = {y.item():.4f}")

Output (Approximate):

Step 1: x = 4.0000, y = 25.0000
Step 2: x = 3.2000, y = 16.0000
Step 3: x = 2.5600, y = 10.2400
...
Step 20: x ≈ 0.0115, y ≈ 0.0001

The value of x approaches 0 — the minimum point of ( f(x) = x^2 ).
This illustrates how backpropagation and gradient descent work together.

Example 3: Gradient with Non-Scalar Outputs

If the output is not a scalar, you must pass a gradient argument to .backward().

x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = x * 2

# Backward with a gradient argument
y.backward(torch.tensor([1.0, 0.1, 0.01]))

print(x.grad)

This computes a weighted gradient across each output element.

Summary

Concept	Description
Gradient	Measures the rate of change of a function’s output with respect to its input.
Autograd	PyTorch’s system for automatic differentiation.
`.backward()`	Computes gradients automatically.
`.grad`	Stores the computed gradients for tensors.
`torch.no_grad()` / `.detach()`	Used to disable gradient tracking during inference.

Exercises

Basic Gradient Calculation
Create a tensor ( x = 3.0 ) with requires_grad=True and compute the gradient of ( y = 4x^3 + 2x^2 + x ).
Vector Function Gradient
Compute gradients for ( y = x_1^2 + 3x_2^3 ) using torch.autograd.
Gradient Descent Practice
Write a program that minimizes ( f(x) = (x - 5)^2 ) using gradient descent with learning rate = 0.05.
No Gradient Tracking
Use torch.no_grad() to perform inference on a trained model without computing gradients.
Challenge
Create a small PyTorch function that takes a scalar input ( x ) and returns both its value and derivative for any polynomial ( ax^3 + bx^2 + cx + d ).

#Search This #Blog " #Career #Education for #Success - #Discover #Apply #Succeed"

CAREER EDUCATION for SUCCESS "Discover, Apply, Succeed "!