Chapter 11: Generative Models in PyTorch


Abstract:

Generative models in PyTorch are a class of deep learning models designed to create new data instances that resemble the training data. PyTorch, a flexible deep learning framework, provides the tools and functionalities necessary to implement various types of generative models.
Common Generative Models Implemented in PyTorch:
  • Generative Adversarial Networks (GANs):
    • GANs consist of two neural networks: a generator and a discriminator.
    • The generator learns to produce synthetic data (e.g., images) from a random noise vector.
    • The discriminator learns to distinguish between real data and the synthetic data generated by the generator.
    • They are trained in a competitive setup, where the generator aims to fool the discriminator, and the discriminator aims to accurately identify fakes.
  • Variational Autoencoders (VAEs):
    • VAEs are a type of autoencoder that learn a probabilistic mapping from the input data to a latent space.
    • They aim to generate new data by sampling from this learned latent distribution.
  • Diffusion Models:
    • Diffusion models learn to generate data by reversing a gradual diffusion process.
    • They progressively add noise to data during a forward process and then learn to denoise it during a reverse process to generate new samples.
    • These models have gained significant popularity for high-quality image generation (e.g., Stable Diffusion, DALL-E).
  • Autoregressive Models (e.g., Transformers for Text Generation):
    • While not exclusively generative, autoregressive models like Transformers, when used for tasks like language modeling, can generate sequences (e.g., text) by predicting the next element in a sequence based on the preceding ones. 
Implementing Generative Models in PyTorch:
PyTorch provides modules like torch.nn for building neural network layers, torch.optim for optimizers, and torch.utils.data for data loading and preprocessing, all of which are essential for constructing and training generative models. Libraries like torchvision also offer pre-trained models and datasets useful for computer vision tasks involving generative models

Here’s the complete Chapter 11: Generative Models written in a textbook style, suitable for PyTorch-based deep learning book — complete with learning objectives, detailed explanations, examples, and exercises.


Chapter 11: Generative Models

Learning Objectives

After completing this chapter, you will be able to:

  • Understand the principles behind generative modeling.

  • Explain the structure and working of Autoencoders and Variational Autoencoders (VAEs).

  • Understand the architecture and training process of Generative Adversarial Networks (GANs).

  • Implement a simple GAN for image generation in PyTorch.

  • Evaluate and improve the performance of generative models.


11.1 Introduction to Generative Models

Generative models are a class of machine learning models that can learn the underlying distribution of data and generate new data samples similar to the training data. Unlike discriminative models, which predict labels or classes, generative models focus on creating data.

Key Applications

  • Image synthesis (e.g., generating realistic faces, objects, or scenes)

  • Data augmentation for training deep networks

  • Anomaly detection

  • Text and music generation

  • Style transfer and super-resolution


11.2 Autoencoders

Definition

An Autoencoder is a type of neural network used for unsupervised learning that attempts to reconstruct its input. It compresses the input into a lower-dimensional representation (encoding) and then reconstructs the original data from this encoding (decoding).

Architecture

An autoencoder consists of two main parts:

  1. Encoder: Compresses the input data into a lower-dimensional latent space.

  2. Decoder: Reconstructs the input data from the latent representation.

[
\text{Input} ; x ; \xrightarrow{\text{Encoder}} ; z ; \xrightarrow{\text{Decoder}} ; \hat{x}
]

The model is trained to minimize the reconstruction loss:

[
L(x, \hat{x}) = | x - \hat{x} |^2
]

Example: Autoencoder in PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms

# Define the Autoencoder architecture
class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(28 * 28, 128),
            nn.ReLU(),
            nn.Linear(128, 64)
        )
        self.decoder = nn.Sequential(
            nn.Linear(64, 128),
            nn.ReLU(),
            nn.Linear(128, 28 * 28),
            nn.Sigmoid()
        )

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

# Data and training
transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('.', train=True, download=True, transform=transform), batch_size=64, shuffle=True
)

model = Autoencoder()
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)

# Training loop
for epoch in range(5):
    for data, _ in train_loader:
        reconstructed = model(data)
        loss = criterion(reconstructed, data)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(f"Epoch [{epoch+1}/5], Loss: {loss.item():.4f}")

This simple autoencoder learns to compress and reconstruct MNIST digit images.


11.3 Variational Autoencoders (VAEs)

Motivation

Standard autoencoders produce deterministic latent representations, which limits their ability to generate new samples.
Variational Autoencoders (VAEs) introduce a probabilistic approach to encoding, making them capable of generating new, unseen samples.

Key Idea

Instead of encoding inputs into a fixed vector ( z ), a VAE encodes them into a distribution, typically a Gaussian with mean ( \mu ) and standard deviation ( \sigma ).

[
z \sim \mathcal{N}(\mu(x), \sigma(x))
]

VAE Loss Function

The loss function consists of two parts:

  1. Reconstruction Loss: Ensures output ( \hat{x} ) is similar to input ( x ).

  2. KL Divergence: Regularizes the latent distribution to be close to a standard normal distribution.

[
L = \text{Reconstruction Loss} + \beta \times D_{KL}\left( q(z|x) || p(z) \right)
]

VAE in PyTorch

class VAE(nn.Module):
    def __init__(self):
        super(VAE, self).__init__()
        self.fc1 = nn.Linear(784, 400)
        self.fc21 = nn.Linear(400, 20)  # mean
        self.fc22 = nn.Linear(400, 20)  # log-variance
        self.fc3 = nn.Linear(20, 400)
        self.fc4 = nn.Linear(400, 784)

    def encode(self, x):
        h = torch.relu(self.fc1(x))
        return self.fc21(h), self.fc22(h)

    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std

    def decode(self, z):
        h = torch.relu(self.fc3(z))
        return torch.sigmoid(self.fc4(h))

    def forward(self, x):
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        return self.decode(z), mu, logvar

def loss_function(recon_x, x, mu, logvar):
    BCE = nn.functional.binary_cross_entropy(recon_x, x, reduction='sum')
    KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    return BCE + KLD

VAEs can generate new data samples by sampling from the latent space ( z ).


11.4 Generative Adversarial Networks (GANs)

Concept

Proposed by Ian Goodfellow in 2014, Generative Adversarial Networks (GANs) consist of two competing neural networks:

  1. Generator (G): Tries to produce realistic fake data.

  2. Discriminator (D): Tries to distinguish between real and fake data.

The generator learns to fool the discriminator, and the discriminator learns to detect fakes.

Objective Function

[
\min_G \max_D V(D, G) = \mathbb{E}{x \sim p{data}(x)} [\log D(x)] + \mathbb{E}_{z \sim p_z(z)} [\log(1 - D(G(z)))]
]

This defines a minimax game between ( G ) and ( D ).


11.5 Training and Evaluating GANs

Training Steps

  1. Sample random noise ( z ) from a uniform or Gaussian distribution.

  2. Generate fake data ( G(z) ).

  3. Train the discriminator to distinguish between real and fake data.

  4. Train the generator to fool the discriminator.

This adversarial process continues until the generator produces realistic data indistinguishable from real samples.

Evaluation Metrics

Evaluating GANs is challenging. Common metrics include:

  • Inception Score (IS) — measures diversity and quality of generated images.

  • Fréchet Inception Distance (FID) — compares statistics of real and generated data.

  • Visual Inspection — practical for smaller datasets.


11.6 Example: Image Generation with a GAN (MNIST)

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torchvision.utils import save_image

# Generator Network
class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(100, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 512),
            nn.LeakyReLU(0.2),
            nn.Linear(512, 1024),
            nn.LeakyReLU(0.2),
            nn.Linear(1024, 784),
            nn.Tanh()
        )

    def forward(self, z):
        return self.model(z)

# Discriminator Network
class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(784, 512),
            nn.LeakyReLU(0.2),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.model(x)

# Initialize models
G = Generator()
D = Discriminator()

# Optimizers and loss
criterion = nn.BCELoss()
optimizer_G = optim.Adam(G.parameters(), lr=0.0002)
optimizer_D = optim.Adam(D.parameters(), lr=0.0002)

# Data
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize([0.5], [0.5]),
    transforms.Lambda(lambda x: x.view(-1))
])
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('.', train=True, download=True, transform=transform), batch_size=128, shuffle=True
)

# Training loop
for epoch in range(5):
    for i, (imgs, _) in enumerate(train_loader):
        real = imgs
        batch_size = real.size(0)
        real_labels = torch.ones(batch_size, 1)
        fake_labels = torch.zeros(batch_size, 1)

        # Train Discriminator
        z = torch.randn(batch_size, 100)
        fake = G(z)
        D_loss = criterion(D(real), real_labels) + criterion(D(fake.detach()), fake_labels)
        optimizer_D.zero_grad()
        D_loss.backward()
        optimizer_D.step()

        # Train Generator
        G_loss = criterion(D(fake), real_labels)
        optimizer_G.zero_grad()
        G_loss.backward()
        optimizer_G.step()

    print(f"Epoch [{epoch+1}/5] | D_loss: {D_loss.item():.4f} | G_loss: {G_loss.item():.4f}")

    # Save sample images
    save_image(fake.view(fake.size(0), 1, 28, 28)[:25], f"fake_images_epoch{epoch+1}.png", nrow=5, normalize=True)

After training, the generator will start producing digits similar to the MNIST dataset.


11.7 Summary

  • Autoencoders compress data into latent space and reconstruct it.

  • Variational Autoencoders (VAEs) extend this by learning a probabilistic latent distribution, enabling new data generation.

  • Generative Adversarial Networks (GANs) use an adversarial approach with generator and discriminator networks.

  • Generative models are powerful tools for image synthesis, data augmentation, and creative applications.


11.8 Exercises

  1. Conceptual Questions

    • Explain the key difference between an autoencoder and a variational autoencoder.

    • Why is KL divergence used in VAEs?

    • What are the main challenges in training GANs?

  2. Programming Tasks

    • Modify the autoencoder example to use convolutional layers for image data.

    • Implement a conditional GAN (cGAN) that generates digits based on class labels.

    • Compute the Fréchet Inception Distance (FID) between generated and real images using PyTorch’s torchmetrics.

  3. Research-Oriented Tasks

    • Compare VAEs and GANs in terms of output quality and training stability.

    • Explore recent advancements such as StyleGAN or Diffusion Models and summarize their advantages over traditional GANs.

Comments