Abstract:

Below is Special Annexure 1: PyTorch Interview Questions and Answers (Basic to Advanced) — comprehensive, structured, and industry-ready.

**Special Annexure 1

PyTorch Interview Questions and Answers (Basic to Advanced)**

This annexure compiles curated technical interview questions frequently asked in academic, industrial, and research roles involving PyTorch. The questions span beginner, intermediate, and advanced levels, covering tensors, autograd, neural networks, optimization, GPU acceleration, deployment, and troubleshooting.

Section A: Basic-Level Questions

1. What is PyTorch?

PyTorch is an open-source deep learning framework developed by Facebook AI Research. It provides:

Dynamic computational graphs
Efficient tensor operations
Automatic differentiation
High flexibility for research and prototyping

2. What is a tensor in PyTorch?

A tensor is a multidimensional array similar to:

NumPy arrays (CPU)
GPU arrays (CUDA-supported)

PyTorch tensors support GPU acceleration and autograd.

3. How do you create a tensor in PyTorch?

x = torch.tensor([1, 2, 3])

Other methods:

torch.zeros()
torch.ones()
torch.randn()

4. What is the difference between NumPy arrays and PyTorch tensors?

Feature	NumPy	PyTorch
GPU Support	❌ No	✔ Yes (`cuda`)
Autograd	❌ No	✔ Yes
Deep Learning	Indirect	Native

5. How do you check if CUDA is available?

torch.cuda.is_available()

6. What is Autograd in PyTorch?

Autograd is PyTorch’s automatic differentiation engine.
It tracks operations and computes gradients for tensors with requires_grad=True.

7. How do you disable gradient calculation?

with torch.no_grad():
    output = model(x)

8. Difference between `model.train()` and `model.eval()`?

Mode	Purpose
`model.train()`	Enables dropout, batchnorm updates
`model.eval()`	Turns off dropout, uses running stats

9. What is a DataLoader?

A DataLoader:

Loads data in batches
Supports shuffling
Uses multiprocessing (num_workers)

loader = DataLoader(dataset, batch_size=32, shuffle=True)

10. What is the purpose of an optimizer?

Optimizers update model parameters (weights) using gradients during training.
Examples:

SGD
Adam
RMSProp

Section B: Intermediate-Level Questions

11. What does `backward()` do?

Computes gradients for all tensors in the computation graph:

loss.backward()

12. How do you update parameters in PyTorch?

optimizer.step()

But always after:

optimizer.zero_grad()
loss.backward()

13. What is a custom Dataset class?

A user-defined dataset that inherits from torch.utils.data.Dataset.

class MyDataset(Dataset):
    def __getitem__(self, idx):
        return data[idx], labels[idx]

14. What is the purpose of `collate_fn` in DataLoader?

It defines how a batch of samples is combined.

Useful for:

Variable-length sequences (text)
Audio clips
Complex structures

15. Explain dynamic computation graph.

PyTorch builds the graph on-the-fly during execution.
This means:

Flexible designs
Easy debugging
Better suited for NLP/RL tasks

16. How do you save and load models?

Recommended method — save only weights:

torch.save(model.state_dict(), 'model.pth')
model.load_state_dict(torch.load('model.pth'))

17. What are `nn.Module` and `nn.functional`?

Component	Description
`nn.Module`	Layer/object class (state + parameters)
`nn.functional`	Stateless functions (like `F.relu`)

Example:

nn.ReLU() stores state
F.relu() does not

18. What are hooks used for?

Hooks allow inspecting:

Layer inputs/outputs
Gradients
Activations

Used for debugging.

19. How does PyTorch handle broadcasting?

Tensors with different shapes can be automatically expanded to match dimensions following NumPy broadcasting rules.

Example:

a + b  # if shapes are compatible

20. How to perform gradient clipping?

Used to prevent exploding gradients:

torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

Section C: Advanced-Level Questions

21. How does PyTorch's Autograd work internally?

Each tensor has a grad_fn if created by operations
Backpropagation follows a reverse traversal of the computation graph
The graph is freed after backward unless retain_graph=True

22. What is the difference between TorchScript and eager mode?

Mode	Description
Eager Mode	Pythonic, dynamic, easy to debug
TorchScript	Serialized, optimized, deployable (mobile, C++)

TorchScript = tracing or scripting a model.

23. What is Distributed Data Parallel (DDP)?

DDP allows large-scale training across multiple GPUs or nodes.

Key features:

Efficient gradient synchronization
Scalable parallelism
Better performance than DataParallel

24. What is mixed-precision training?

Using FP16 + FP32 to:

Reduce memory usage
Improve speed
Maintain stability

With AMP:

from torch.cuda.amp import autocast, GradScaler

25. Explain custom loss function creation.

class MyLoss(nn.Module):
    def forward(self, pred, target):
        return torch.mean((pred - target)**2)

26. What is gradient accumulation?

Accumulate gradients over multiple batches to simulate larger batches.

loss.backward()
if (i+1) % accumulation_steps == 0:
    optimizer.step()
    optimizer.zero_grad()

27. How do you detect vanishing or exploding gradients?

Inspect gradient norms:

total_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=2.0)
print(total_norm)

28. What is a DataParallel? Why is it slower than DDP?

DataParallel:

Splits data across GPUs
Slow due to CPU overhead & replication

DDP:

Processes run in parallel
Uses efficient communication backend

29. Explain the difference between tracing and scripting in TorchScript.

Type	When Used	Limitation
Tracing	Fixed control flow (CNNs)	Fails with loops, if-statements
Scripting	Dynamic models	Slower to compile

30. How does PyTorch manage memory on GPUs?

Mechanisms:

Caching allocator
Asynchronous execution
Gradient buffer reuse

Common error:

CUDA out of memory

Fix:

Reduce batch size
Use mixed precision
Empty cache:

torch.cuda.empty_cache()

Section D: Real-World Scenario Questions

31. Your loss is not decreasing. How do you debug?

Checklist:

Check learning rate
Inspect preprocessing
Check data-label alignment
Visualize gradient norms
Overfit on a tiny batch

32. How to freeze layers in transfer learning?

for param in model.features.parameters():
    param.requires_grad = False

33. Your model is overfitting. What do you do?

Solutions:

Increase dropout
Data augmentation
Use weight decay
Early stopping

34. How do you deploy a PyTorch model?

Options:

TorchScript
ONNX → TensorRT
PyTorch Mobile
FastAPI/Flask REST API

35. How do you profile a PyTorch model?

with torch.profiler.profile() as prof:
    output = model(x)

print(prof.key_averages().table(sort_by="cpu_time_total"))

Section E: Coding Challenges (With Expected Answers)

Challenge 1: Write a PyTorch code to compute gradients of a simple function.

x = torch.tensor(5.0, requires_grad=True)
y = x**2 + 3*x + 1
y.backward()
print(x.grad)   # Expected: 2x + 3 = 13

Challenge 2: Define a simple feed-forward neural network.

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Sequential(
            nn.Linear(784, 256),
            nn.ReLU(),
            nn.Linear(256, 10)
        )

    def forward(self, x):
        return self.fc(x)

Challenge 3: Build a custom DataLoader for images.

class ImageDataset(Dataset):
    def __init__(self, image_paths, transform=None):
        self.paths = image_paths
        self.transform = transform

    def __getitem__(self, idx):
        img = Image.open(self.paths[idx]).convert("RGB")
        if self.transform:
            img = self.transform(img)
        return img

    def __len__(self):
        return len(self.paths)

Conclusion

This Special Annexure 1 provides a complete interview-ready resource covering:

Fundamental to advanced PyTorch questions
Real-world debugging scenarios
Deployment and optimization
Coding challenges

This is suitable for:

Students
Researchers
ML engineers
Candidates preparing for interviews
Trainers and educators