Chapter 11: Generative Models in PyTorch
Abstract:
- Generative Adversarial Networks (GANs):
- GANs consist of two neural networks: a generator and a discriminator.
- The generator learns to produce synthetic data (e.g., images) from a random noise vector.
- The discriminator learns to distinguish between real data and the synthetic data generated by the generator.
- They are trained in a competitive setup, where the generator aims to fool the discriminator, and the discriminator aims to accurately identify fakes.
- Variational Autoencoders (VAEs):
- VAEs are a type of autoencoder that learn a probabilistic mapping from the input data to a latent space.
- They aim to generate new data by sampling from this learned latent distribution.
- Diffusion Models:
- Diffusion models learn to generate data by reversing a gradual diffusion process.
- They progressively add noise to data during a forward process and then learn to denoise it during a reverse process to generate new samples.
- These models have gained significant popularity for high-quality image generation (e.g., Stable Diffusion, DALL-E).
- Autoregressive Models (e.g., Transformers for Text Generation):
- While not exclusively generative, autoregressive models like Transformers, when used for tasks like language modeling, can generate sequences (e.g., text) by predicting the next element in a sequence based on the preceding ones.
- While not exclusively generative, autoregressive models like Transformers, when used for tasks like language modeling, can generate sequences (e.g., text) by predicting the next element in a sequence based on the preceding ones.
torch.nn for building neural network layers, torch.optim for optimizers, and torch.utils.data for data loading and preprocessing, all of which are essential for constructing and training generative models. Libraries like torchvision also offer pre-trained models and datasets useful for computer vision tasks involving generative modelsHere’s the complete Chapter 11: Generative Models written in a textbook style, suitable for PyTorch-based deep learning book — complete with learning objectives, detailed explanations, examples, and exercises.
Chapter 11: Generative Models
Learning Objectives
After completing this chapter, you will be able to:
-
Understand the principles behind generative modeling.
-
Explain the structure and working of Autoencoders and Variational Autoencoders (VAEs).
-
Understand the architecture and training process of Generative Adversarial Networks (GANs).
-
Implement a simple GAN for image generation in PyTorch.
-
Evaluate and improve the performance of generative models.
11.1 Introduction to Generative Models
Generative models are a class of machine learning models that can learn the underlying distribution of data and generate new data samples similar to the training data. Unlike discriminative models, which predict labels or classes, generative models focus on creating data.
Key Applications
-
Image synthesis (e.g., generating realistic faces, objects, or scenes)
-
Data augmentation for training deep networks
-
Anomaly detection
-
Text and music generation
-
Style transfer and super-resolution
11.2 Autoencoders
Definition
An Autoencoder is a type of neural network used for unsupervised learning that attempts to reconstruct its input. It compresses the input into a lower-dimensional representation (encoding) and then reconstructs the original data from this encoding (decoding).
Architecture
An autoencoder consists of two main parts:
-
Encoder: Compresses the input data into a lower-dimensional latent space.
-
Decoder: Reconstructs the input data from the latent representation.
[
\text{Input} ; x ; \xrightarrow{\text{Encoder}} ; z ; \xrightarrow{\text{Decoder}} ; \hat{x}
]
The model is trained to minimize the reconstruction loss:
[
L(x, \hat{x}) = | x - \hat{x} |^2
]
Example: Autoencoder in PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
# Define the Autoencoder architecture
class Autoencoder(nn.Module):
def __init__(self):
super(Autoencoder, self).__init__()
self.encoder = nn.Sequential(
nn.Linear(28 * 28, 128),
nn.ReLU(),
nn.Linear(128, 64)
)
self.decoder = nn.Sequential(
nn.Linear(64, 128),
nn.ReLU(),
nn.Linear(128, 28 * 28),
nn.Sigmoid()
)
def forward(self, x):
x = self.encoder(x)
x = self.decoder(x)
return x
# Data and training
transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('.', train=True, download=True, transform=transform), batch_size=64, shuffle=True
)
model = Autoencoder()
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)
# Training loop
for epoch in range(5):
for data, _ in train_loader:
reconstructed = model(data)
loss = criterion(reconstructed, data)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(f"Epoch [{epoch+1}/5], Loss: {loss.item():.4f}")
This simple autoencoder learns to compress and reconstruct MNIST digit images.
11.3 Variational Autoencoders (VAEs)
Motivation
Standard autoencoders produce deterministic latent representations, which limits their ability to generate new samples.
Variational Autoencoders (VAEs) introduce a probabilistic approach to encoding, making them capable of generating new, unseen samples.
Key Idea
Instead of encoding inputs into a fixed vector ( z ), a VAE encodes them into a distribution, typically a Gaussian with mean ( \mu ) and standard deviation ( \sigma ).
[
z \sim \mathcal{N}(\mu(x), \sigma(x))
]
VAE Loss Function
The loss function consists of two parts:
-
Reconstruction Loss: Ensures output ( \hat{x} ) is similar to input ( x ).
-
KL Divergence: Regularizes the latent distribution to be close to a standard normal distribution.
[
L = \text{Reconstruction Loss} + \beta \times D_{KL}\left( q(z|x) || p(z) \right)
]
VAE in PyTorch
class VAE(nn.Module):
def __init__(self):
super(VAE, self).__init__()
self.fc1 = nn.Linear(784, 400)
self.fc21 = nn.Linear(400, 20) # mean
self.fc22 = nn.Linear(400, 20) # log-variance
self.fc3 = nn.Linear(20, 400)
self.fc4 = nn.Linear(400, 784)
def encode(self, x):
h = torch.relu(self.fc1(x))
return self.fc21(h), self.fc22(h)
def reparameterize(self, mu, logvar):
std = torch.exp(0.5 * logvar)
eps = torch.randn_like(std)
return mu + eps * std
def decode(self, z):
h = torch.relu(self.fc3(z))
return torch.sigmoid(self.fc4(h))
def forward(self, x):
mu, logvar = self.encode(x)
z = self.reparameterize(mu, logvar)
return self.decode(z), mu, logvar
def loss_function(recon_x, x, mu, logvar):
BCE = nn.functional.binary_cross_entropy(recon_x, x, reduction='sum')
KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
return BCE + KLD
VAEs can generate new data samples by sampling from the latent space ( z ).
11.4 Generative Adversarial Networks (GANs)
Concept
Proposed by Ian Goodfellow in 2014, Generative Adversarial Networks (GANs) consist of two competing neural networks:
-
Generator (G): Tries to produce realistic fake data.
-
Discriminator (D): Tries to distinguish between real and fake data.
The generator learns to fool the discriminator, and the discriminator learns to detect fakes.
Objective Function
[
\min_G \max_D V(D, G) = \mathbb{E}{x \sim p{data}(x)} [\log D(x)] + \mathbb{E}_{z \sim p_z(z)} [\log(1 - D(G(z)))]
]
This defines a minimax game between ( G ) and ( D ).
11.5 Training and Evaluating GANs
Training Steps
-
Sample random noise ( z ) from a uniform or Gaussian distribution.
-
Generate fake data ( G(z) ).
-
Train the discriminator to distinguish between real and fake data.
-
Train the generator to fool the discriminator.
This adversarial process continues until the generator produces realistic data indistinguishable from real samples.
Evaluation Metrics
Evaluating GANs is challenging. Common metrics include:
-
Inception Score (IS) — measures diversity and quality of generated images.
-
Fréchet Inception Distance (FID) — compares statistics of real and generated data.
-
Visual Inspection — practical for smaller datasets.
11.6 Example: Image Generation with a GAN (MNIST)
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torchvision.utils import save_image
# Generator Network
class Generator(nn.Module):
def __init__(self):
super(Generator, self).__init__()
self.model = nn.Sequential(
nn.Linear(100, 256),
nn.LeakyReLU(0.2),
nn.Linear(256, 512),
nn.LeakyReLU(0.2),
nn.Linear(512, 1024),
nn.LeakyReLU(0.2),
nn.Linear(1024, 784),
nn.Tanh()
)
def forward(self, z):
return self.model(z)
# Discriminator Network
class Discriminator(nn.Module):
def __init__(self):
super(Discriminator, self).__init__()
self.model = nn.Sequential(
nn.Linear(784, 512),
nn.LeakyReLU(0.2),
nn.Linear(512, 256),
nn.LeakyReLU(0.2),
nn.Linear(256, 1),
nn.Sigmoid()
)
def forward(self, x):
return self.model(x)
# Initialize models
G = Generator()
D = Discriminator()
# Optimizers and loss
criterion = nn.BCELoss()
optimizer_G = optim.Adam(G.parameters(), lr=0.0002)
optimizer_D = optim.Adam(D.parameters(), lr=0.0002)
# Data
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize([0.5], [0.5]),
transforms.Lambda(lambda x: x.view(-1))
])
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('.', train=True, download=True, transform=transform), batch_size=128, shuffle=True
)
# Training loop
for epoch in range(5):
for i, (imgs, _) in enumerate(train_loader):
real = imgs
batch_size = real.size(0)
real_labels = torch.ones(batch_size, 1)
fake_labels = torch.zeros(batch_size, 1)
# Train Discriminator
z = torch.randn(batch_size, 100)
fake = G(z)
D_loss = criterion(D(real), real_labels) + criterion(D(fake.detach()), fake_labels)
optimizer_D.zero_grad()
D_loss.backward()
optimizer_D.step()
# Train Generator
G_loss = criterion(D(fake), real_labels)
optimizer_G.zero_grad()
G_loss.backward()
optimizer_G.step()
print(f"Epoch [{epoch+1}/5] | D_loss: {D_loss.item():.4f} | G_loss: {G_loss.item():.4f}")
# Save sample images
save_image(fake.view(fake.size(0), 1, 28, 28)[:25], f"fake_images_epoch{epoch+1}.png", nrow=5, normalize=True)
After training, the generator will start producing digits similar to the MNIST dataset.
11.7 Summary
-
Autoencoders compress data into latent space and reconstruct it.
-
Variational Autoencoders (VAEs) extend this by learning a probabilistic latent distribution, enabling new data generation.
-
Generative Adversarial Networks (GANs) use an adversarial approach with generator and discriminator networks.
-
Generative models are powerful tools for image synthesis, data augmentation, and creative applications.
11.8 Exercises
-
Conceptual Questions
-
Explain the key difference between an autoencoder and a variational autoencoder.
-
Why is KL divergence used in VAEs?
-
What are the main challenges in training GANs?
-
-
Programming Tasks
-
Modify the autoencoder example to use convolutional layers for image data.
-
Implement a conditional GAN (cGAN) that generates digits based on class labels.
-
Compute the Fréchet Inception Distance (FID) between generated and real images using PyTorch’s
torchmetrics.
-
-
Research-Oriented Tasks
-
Compare VAEs and GANs in terms of output quality and training stability.
-
Explore recent advancements such as StyleGAN or Diffusion Models and summarize their advantages over traditional GANs.
-
Comments
Post a Comment
"Thank you for seeking advice on your career journey! Our team is dedicated to providing personalized guidance on education and success. Please share your specific questions or concerns, and we'll assist you in navigating the path to a fulfilling and successful career."