Chapter 2: PyTorch Basics : Essential for Mastering PyTorch

Abstract 

PyTorch is an open-source machine learning library primarily used for building and training deep learning models. Its key features and fundamental concepts include: 
1. Tensors:
  • Tensors are the fundamental data structure in PyTorch, similar to NumPy arrays but with GPU acceleration capabilities.
  • They represent multi-dimensional arrays and are used to store data, model parameters, and intermediate computations.
  • Operations on tensors are optimized for performance, especially on GPUs.
2. Autograd (Automatic Differentiation):
  • PyTorch's autograd engine automatically computes gradients for all operations on tensors with requires_grad=True.
  • This is crucial for backpropagation in neural networks, where gradients are used to update model parameters during training.
  • It builds a dynamic computation graph, allowing for flexible model architectures and conditional computations.
3. torch.nn Module:
  • This module provides pre-built layers, activation functions, loss functions, and other building blocks for constructing neural networks.
  • nn.Module is the base class for all neural network modules, allowing for easy creation of custom layers and models.
4. Optimizers:
  • The torch.optim module offers various optimization algorithms (e.g., SGD, Adam, RMSprop) to update model parameters based on computed gradients, minimizing the loss function.
5. Data Loading and Handling:
  • torch.utils.data provides Dataset and DataLoader classes for efficient data loading, batching, and shuffling during training.
  • Dataset defines how to access individual data samples, while DataLoader handles iterating over batches of data.
6. GPU Support:
  • PyTorch seamlessly integrates with NVIDIA GPUs (via CUDA) to accelerate computations, making deep learning model training significantly faster.
  • Tensors and models can be easily moved between CPU and GPU memory.
7. Model Training Loop:
  • A typical PyTorch training loop involves:
    • Defining the model, loss function, and optimizer.
    • Iterating over epochs and batches of data.
    • Performing a forward pass to get predictions.
    • Calculating the loss.
    • Performing a backward pass to compute gradients.
    • Updating model parameters using the optimizer.
In essence, PyTorch provides a flexible and efficient framework for developing and deploying deep learning models, leveraging Python's ease of use and powerful GPU acceleration

Let's explore the chapter deeper into the world of Pytorch basics written in full textbook style, including learning objectives, detailed explanations, examples, and exercises 

Chapter 2: PyTorch Basics


Learning Objectives

After completing this chapter, learners will be able to:

  1. Understand what tensors are and why they are central to PyTorch.

  2. Create, manipulate, and perform operations on tensors.

  3. Use indexing, slicing, and reshaping techniques effectively.

  4. Understand and apply broadcasting rules in PyTorch tensor arithmetic.

  5. Utilize GPU acceleration with CUDA for efficient computation.


2.1 Tensors: Definition and Operations

What is a Tensor?

A tensor is a fundamental data structure in PyTorch—similar to NumPy arrays but optimized for GPU computation.
Tensors are multi-dimensional arrays that can represent scalars, vectors, matrices, and higher-dimensional data.

Tensor Rank Example Description
0-D torch.tensor(7) Scalar (single number)
1-D torch.tensor([1, 2, 3]) Vector
2-D torch.tensor([[1, 2], [3, 4]]) Matrix
3-D+ Used for images, videos, etc. Higher dimensions

Example: Creating a Simple Tensor

import torch

x = torch.tensor([[1, 2, 3], [4, 5, 6]])
print(x)
print(x.dtype)
print(x.shape)

Output:

tensor([[1, 2, 3],
        [4, 5, 6]])
torch.int64
torch.Size([2, 3])

Key Tensor Operations

Operation Example Description
Addition a + b or torch.add(a, b) Element-wise addition
Multiplication a * b or torch.mul(a, b) Element-wise multiplication
Matrix Multiplication torch.mm(a, b) or a @ b Dot product between matrices
Transpose a.T or a.transpose(0, 1) Switches rows and columns
Sum a.sum() Returns the sum of all elements

2.2 Tensor Creation and Manipulation

PyTorch provides various methods for tensor creation, both from existing data and randomized initialization.

Creating Tensors

# From list or tuple
data = [[1, 2], [3, 4]]
x_data = torch.tensor(data)

# From NumPy array
import numpy as np
np_array = np.array(data)
x_np = torch.from_numpy(np_array)

# Using built-in functions
x_ones = torch.ones((2, 3))      # Tensor of ones
x_zeros = torch.zeros((2, 3))    # Tensor of zeros
x_rand = torch.rand((2, 3))      # Random values between 0 and 1
x_arange = torch.arange(0, 10, 2)  # Values 0, 2, 4, 6, 8
x_linspace = torch.linspace(0, 1, 5)  # Evenly spaced values between 0 and 1

Manipulating Tensors

You can modify tensor shape, data type, and device.

x = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)
x_reshaped = x.view(4)       # Reshape (2x2) to (4,)
x_transposed = x.t()         # Transpose
x_float64 = x.double()       # Change data type

Important Attributes

Attribute Description Example
x.shape Tensor dimensions (2, 3)
x.dtype Data type torch.float32
x.device CPU or GPU cpu or cuda:0

2.3 Indexing, Slicing, and Reshaping

Manipulating parts of tensors is essential for data selection and transformation.

Indexing

You can access elements similarly to NumPy arrays.

x = torch.tensor([[10, 20, 30], [40, 50, 60]])
print(x[0, 1])   # 20
print(x[:, 2])   # [30, 60]

Slicing

Extract sub-tensors by specifying ranges.

print(x[0:2, 1:3])  # Elements from rows 0–1 and columns 1–2

Reshaping

Reshaping allows conversion between different dimensions without changing data.

a = torch.arange(9)
b = a.reshape(3, 3)
c = b.flatten()
print(b)
print(c)

Output:

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8])

Concatenation and Stacking

a = torch.tensor([[1, 2]])
b = torch.tensor([[3, 4]])
cat = torch.cat([a, b], dim=0)   # Vertical
stack = torch.stack([a, b])      # Adds new dimension

2.4 Broadcasting and Tensor Arithmetic

Understanding Broadcasting

Broadcasting allows PyTorch to perform operations on tensors of different shapes by automatically expanding them to a compatible shape.

Example of broadcasting:

a = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])
b = torch.tensor([1, 2, 3])

result = a + b
print(result)

Output:

tensor([[2, 4, 6],
        [5, 7, 9]])

Here, b was broadcasted (copied across rows) to match the shape of a.

Arithmetic Operations

Operation Function Description
Addition torch.add(a, b) Element-wise addition
Subtraction torch.sub(a, b) Element-wise subtraction
Multiplication torch.mul(a, b) Element-wise multiplication
Division torch.div(a, b) Element-wise division
Matrix Multiplication torch.matmul(a, b) Matrix dot product

Example:

x = torch.tensor([[2, 4], [6, 8]], dtype=torch.float32)
y = torch.tensor([[1, 3], [5, 7]], dtype=torch.float32)

print(torch.add(x, y))
print(torch.mul(x, y))
print(torch.matmul(x, y))

2.5 GPU and CUDA Basics

PyTorch provides seamless integration with CUDA, allowing tensor operations to run on the GPU, dramatically improving speed for large computations.

Checking GPU Availability

import torch

print(torch.cuda.is_available())  # Returns True if CUDA GPU is available

Moving Tensors to GPU

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
x = torch.rand(3, 3).to(device)
print(x.device)

GPU Operations

When a tensor is on the GPU, all operations are performed there.

y = torch.rand(3, 3).to(device)
z = x + y   # Computation happens on GPU

To move back to CPU:

z_cpu = z.to("cpu")

2.6 Summary

  • Tensors are the core data structure in PyTorch, enabling efficient mathematical computation.

  • You can create, index, slice, and reshape tensors with ease.

  • Broadcasting simplifies arithmetic between different-sized tensors.

  • CUDA allows acceleration using GPUs, improving training and inference speed.

Understanding these basics lays the foundation for working with neural networks, automatic differentiation, and deep learning models in subsequent chapters.


Exercises

Part A: Objective Questions

  1. What is the main difference between NumPy arrays and PyTorch tensors?

  2. Which method is used to check if CUDA is available in PyTorch?

  3. What function is used for reshaping tensors?

  4. What does broadcasting allow in tensor operations?

  5. How do you move a tensor x to GPU if available?


Part B: Practical Exercises

  1. Create a 3×3 tensor of random values, multiply it by a scalar, and print the result.

  2. Create two tensors of shape (2,3) and perform element-wise addition and multiplication.

  3. Reshape a 1-D tensor of size 12 into shape (3,4) and then flatten it.

  4. Demonstrate broadcasting between a (3×3) tensor and a (3,) tensor.

  5. Move a tensor to GPU, perform a matrix multiplication, and then transfer it back to CPU.


Part C: Challenge Task

Write a short PyTorch script that:

  • Creates two random tensors of size (1000×1000).

  • Performs matrix multiplication both on CPU and GPU.

  • Prints the time taken for each operation.

  • Concludes which device is faster.

Comments