Chapter 2: PyTorch Basics : Essential for Mastering PyTorch
Abstract
- Tensors are the fundamental data structure in PyTorch, similar to NumPy arrays but with GPU acceleration capabilities.
- They represent multi-dimensional arrays and are used to store data, model parameters, and intermediate computations.
- Operations on tensors are optimized for performance, especially on GPUs.
- PyTorch's
autogradengine automatically computes gradients for all operations on tensors withrequires_grad=True. - This is crucial for backpropagation in neural networks, where gradients are used to update model parameters during training.
- It builds a dynamic computation graph, allowing for flexible model architectures and conditional computations.
torch.nn Module:- This module provides pre-built layers, activation functions, loss functions, and other building blocks for constructing neural networks.
nn.Moduleis the base class for all neural network modules, allowing for easy creation of custom layers and models.
- The
torch.optimmodule offers various optimization algorithms (e.g., SGD, Adam, RMSprop) to update model parameters based on computed gradients, minimizing the loss function.
torch.utils.dataprovidesDatasetandDataLoaderclasses for efficient data loading, batching, and shuffling during training.Datasetdefines how to access individual data samples, whileDataLoaderhandles iterating over batches of data.
- PyTorch seamlessly integrates with NVIDIA GPUs (via CUDA) to accelerate computations, making deep learning model training significantly faster.
- Tensors and models can be easily moved between CPU and GPU memory.
- A typical PyTorch training loop involves:
- Defining the model, loss function, and optimizer.
- Iterating over epochs and batches of data.
- Performing a forward pass to get predictions.
- Calculating the loss.
- Performing a backward pass to compute gradients.
- Updating model parameters using the optimizer.
Chapter 2: PyTorch Basics
Learning Objectives
After completing this chapter, learners will be able to:
-
Understand what tensors are and why they are central to PyTorch.
-
Create, manipulate, and perform operations on tensors.
-
Use indexing, slicing, and reshaping techniques effectively.
-
Understand and apply broadcasting rules in PyTorch tensor arithmetic.
-
Utilize GPU acceleration with CUDA for efficient computation.
2.1 Tensors: Definition and Operations
What is a Tensor?
A tensor is a fundamental data structure in PyTorch—similar to NumPy arrays but optimized for GPU computation.
Tensors are multi-dimensional arrays that can represent scalars, vectors, matrices, and higher-dimensional data.
| Tensor Rank | Example | Description |
|---|---|---|
| 0-D | torch.tensor(7) |
Scalar (single number) |
| 1-D | torch.tensor([1, 2, 3]) |
Vector |
| 2-D | torch.tensor([[1, 2], [3, 4]]) |
Matrix |
| 3-D+ | Used for images, videos, etc. | Higher dimensions |
Example: Creating a Simple Tensor
import torch
x = torch.tensor([[1, 2, 3], [4, 5, 6]])
print(x)
print(x.dtype)
print(x.shape)
Output:
tensor([[1, 2, 3],
[4, 5, 6]])
torch.int64
torch.Size([2, 3])
Key Tensor Operations
| Operation | Example | Description |
|---|---|---|
| Addition | a + b or torch.add(a, b) |
Element-wise addition |
| Multiplication | a * b or torch.mul(a, b) |
Element-wise multiplication |
| Matrix Multiplication | torch.mm(a, b) or a @ b |
Dot product between matrices |
| Transpose | a.T or a.transpose(0, 1) |
Switches rows and columns |
| Sum | a.sum() |
Returns the sum of all elements |
2.2 Tensor Creation and Manipulation
PyTorch provides various methods for tensor creation, both from existing data and randomized initialization.
Creating Tensors
# From list or tuple
data = [[1, 2], [3, 4]]
x_data = torch.tensor(data)
# From NumPy array
import numpy as np
np_array = np.array(data)
x_np = torch.from_numpy(np_array)
# Using built-in functions
x_ones = torch.ones((2, 3)) # Tensor of ones
x_zeros = torch.zeros((2, 3)) # Tensor of zeros
x_rand = torch.rand((2, 3)) # Random values between 0 and 1
x_arange = torch.arange(0, 10, 2) # Values 0, 2, 4, 6, 8
x_linspace = torch.linspace(0, 1, 5) # Evenly spaced values between 0 and 1
Manipulating Tensors
You can modify tensor shape, data type, and device.
x = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)
x_reshaped = x.view(4) # Reshape (2x2) to (4,)
x_transposed = x.t() # Transpose
x_float64 = x.double() # Change data type
Important Attributes
| Attribute | Description | Example |
|---|---|---|
x.shape |
Tensor dimensions | (2, 3) |
x.dtype |
Data type | torch.float32 |
x.device |
CPU or GPU | cpu or cuda:0 |
2.3 Indexing, Slicing, and Reshaping
Manipulating parts of tensors is essential for data selection and transformation.
Indexing
You can access elements similarly to NumPy arrays.
x = torch.tensor([[10, 20, 30], [40, 50, 60]])
print(x[0, 1]) # 20
print(x[:, 2]) # [30, 60]
Slicing
Extract sub-tensors by specifying ranges.
print(x[0:2, 1:3]) # Elements from rows 0–1 and columns 1–2
Reshaping
Reshaping allows conversion between different dimensions without changing data.
a = torch.arange(9)
b = a.reshape(3, 3)
c = b.flatten()
print(b)
print(c)
Output:
tensor([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8])
Concatenation and Stacking
a = torch.tensor([[1, 2]])
b = torch.tensor([[3, 4]])
cat = torch.cat([a, b], dim=0) # Vertical
stack = torch.stack([a, b]) # Adds new dimension
2.4 Broadcasting and Tensor Arithmetic
Understanding Broadcasting
Broadcasting allows PyTorch to perform operations on tensors of different shapes by automatically expanding them to a compatible shape.
Example of broadcasting:
a = torch.tensor([[1, 2, 3],
[4, 5, 6]])
b = torch.tensor([1, 2, 3])
result = a + b
print(result)
Output:
tensor([[2, 4, 6],
[5, 7, 9]])
Here, b was broadcasted (copied across rows) to match the shape of a.
Arithmetic Operations
| Operation | Function | Description |
|---|---|---|
| Addition | torch.add(a, b) |
Element-wise addition |
| Subtraction | torch.sub(a, b) |
Element-wise subtraction |
| Multiplication | torch.mul(a, b) |
Element-wise multiplication |
| Division | torch.div(a, b) |
Element-wise division |
| Matrix Multiplication | torch.matmul(a, b) |
Matrix dot product |
Example:
x = torch.tensor([[2, 4], [6, 8]], dtype=torch.float32)
y = torch.tensor([[1, 3], [5, 7]], dtype=torch.float32)
print(torch.add(x, y))
print(torch.mul(x, y))
print(torch.matmul(x, y))
2.5 GPU and CUDA Basics
PyTorch provides seamless integration with CUDA, allowing tensor operations to run on the GPU, dramatically improving speed for large computations.
Checking GPU Availability
import torch
print(torch.cuda.is_available()) # Returns True if CUDA GPU is available
Moving Tensors to GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
x = torch.rand(3, 3).to(device)
print(x.device)
GPU Operations
When a tensor is on the GPU, all operations are performed there.
y = torch.rand(3, 3).to(device)
z = x + y # Computation happens on GPU
To move back to CPU:
z_cpu = z.to("cpu")
2.6 Summary
-
Tensors are the core data structure in PyTorch, enabling efficient mathematical computation.
-
You can create, index, slice, and reshape tensors with ease.
-
Broadcasting simplifies arithmetic between different-sized tensors.
-
CUDA allows acceleration using GPUs, improving training and inference speed.
Understanding these basics lays the foundation for working with neural networks, automatic differentiation, and deep learning models in subsequent chapters.
Exercises
Part A: Objective Questions
-
What is the main difference between NumPy arrays and PyTorch tensors?
-
Which method is used to check if CUDA is available in PyTorch?
-
What function is used for reshaping tensors?
-
What does broadcasting allow in tensor operations?
-
How do you move a tensor
xto GPU if available?
Part B: Practical Exercises
-
Create a 3×3 tensor of random values, multiply it by a scalar, and print the result.
-
Create two tensors of shape (2,3) and perform element-wise addition and multiplication.
-
Reshape a 1-D tensor of size 12 into shape (3,4) and then flatten it.
-
Demonstrate broadcasting between a (3×3) tensor and a (3,) tensor.
-
Move a tensor to GPU, perform a matrix multiplication, and then transfer it back to CPU.
Part C: Challenge Task
Write a short PyTorch script that:
-
Creates two random tensors of size (1000×1000).
-
Performs matrix multiplication both on CPU and GPU.
-
Prints the time taken for each operation.
-
Concludes which device is faster.
Comments
Post a Comment
"Thank you for seeking advice on your career journey! Our team is dedicated to providing personalized guidance on education and success. Please share your specific questions or concerns, and we'll assist you in navigating the path to a fulfilling and successful career."