Special Annexure 1: PyTorch Interview Questions and Answers (Basic to Advanced)

Abstract:

Below is Special Annexure 1: PyTorch Interview Questions and Answers (Basic to Advanced) — comprehensive, structured, and industry-ready.


**Special Annexure 1

PyTorch Interview Questions and Answers (Basic to Advanced)**

This annexure compiles curated technical interview questions frequently asked in academic, industrial, and research roles involving PyTorch. The questions span beginner, intermediate, and advanced levels, covering tensors, autograd, neural networks, optimization, GPU acceleration, deployment, and troubleshooting.


Section A: Basic-Level Questions


1. What is PyTorch?

PyTorch is an open-source deep learning framework developed by Facebook AI Research. It provides:

  • Dynamic computational graphs

  • Efficient tensor operations

  • Automatic differentiation

  • High flexibility for research and prototyping


2. What is a tensor in PyTorch?

A tensor is a multidimensional array similar to:

  • NumPy arrays (CPU)

  • GPU arrays (CUDA-supported)

PyTorch tensors support GPU acceleration and autograd.


3. How do you create a tensor in PyTorch?

x = torch.tensor([1, 2, 3])

Other methods:

  • torch.zeros()

  • torch.ones()

  • torch.randn()


4. What is the difference between NumPy arrays and PyTorch tensors?

Feature NumPy PyTorch
GPU Support ❌ No ✔ Yes (cuda)
Autograd ❌ No ✔ Yes
Deep Learning Indirect Native

5. How do you check if CUDA is available?

torch.cuda.is_available()

6. What is Autograd in PyTorch?

Autograd is PyTorch’s automatic differentiation engine.
It tracks operations and computes gradients for tensors with requires_grad=True.


7. How do you disable gradient calculation?

with torch.no_grad():
    output = model(x)

8. Difference between model.train() and model.eval()?

Mode Purpose
model.train() Enables dropout, batchnorm updates
model.eval() Turns off dropout, uses running stats

9. What is a DataLoader?

A DataLoader:

  • Loads data in batches

  • Supports shuffling

  • Uses multiprocessing (num_workers)

loader = DataLoader(dataset, batch_size=32, shuffle=True)

10. What is the purpose of an optimizer?

Optimizers update model parameters (weights) using gradients during training.
Examples:

  • SGD

  • Adam

  • RMSProp


Section B: Intermediate-Level Questions


11. What does backward() do?

Computes gradients for all tensors in the computation graph:

loss.backward()

12. How do you update parameters in PyTorch?

optimizer.step()

But always after:

optimizer.zero_grad()
loss.backward()

13. What is a custom Dataset class?

A user-defined dataset that inherits from torch.utils.data.Dataset.

class MyDataset(Dataset):
    def __getitem__(self, idx):
        return data[idx], labels[idx]

14. What is the purpose of collate_fn in DataLoader?

It defines how a batch of samples is combined.

Useful for:

  • Variable-length sequences (text)

  • Audio clips

  • Complex structures


15. Explain dynamic computation graph.

PyTorch builds the graph on-the-fly during execution.
This means:

  • Flexible designs

  • Easy debugging

  • Better suited for NLP/RL tasks


16. How do you save and load models?

Recommended method — save only weights:

torch.save(model.state_dict(), 'model.pth')
model.load_state_dict(torch.load('model.pth'))

17. What are nn.Module and nn.functional?

Component Description
nn.Module Layer/object class (state + parameters)
nn.functional Stateless functions (like F.relu)

Example:

  • nn.ReLU() stores state

  • F.relu() does not


18. What are hooks used for?

Hooks allow inspecting:

  • Layer inputs/outputs

  • Gradients

  • Activations

Used for debugging.


19. How does PyTorch handle broadcasting?

Tensors with different shapes can be automatically expanded to match dimensions following NumPy broadcasting rules.

Example:

a + b  # if shapes are compatible

20. How to perform gradient clipping?

Used to prevent exploding gradients:

torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

Section C: Advanced-Level Questions


21. How does PyTorch's Autograd work internally?

  • Each tensor has a grad_fn if created by operations

  • Backpropagation follows a reverse traversal of the computation graph

  • The graph is freed after backward unless retain_graph=True


22. What is the difference between TorchScript and eager mode?

Mode Description
Eager Mode Pythonic, dynamic, easy to debug
TorchScript Serialized, optimized, deployable (mobile, C++)

TorchScript = tracing or scripting a model.


23. What is Distributed Data Parallel (DDP)?

DDP allows large-scale training across multiple GPUs or nodes.

Key features:

  • Efficient gradient synchronization

  • Scalable parallelism

  • Better performance than DataParallel


24. What is mixed-precision training?

Using FP16 + FP32 to:

  • Reduce memory usage

  • Improve speed

  • Maintain stability

With AMP:

from torch.cuda.amp import autocast, GradScaler

25. Explain custom loss function creation.

class MyLoss(nn.Module):
    def forward(self, pred, target):
        return torch.mean((pred - target)**2)

26. What is gradient accumulation?

Accumulate gradients over multiple batches to simulate larger batches.

loss.backward()
if (i+1) % accumulation_steps == 0:
    optimizer.step()
    optimizer.zero_grad()

27. How do you detect vanishing or exploding gradients?

Inspect gradient norms:

total_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=2.0)
print(total_norm)

28. What is a DataParallel? Why is it slower than DDP?

DataParallel:

  • Splits data across GPUs

  • Slow due to CPU overhead & replication

DDP:

  • Processes run in parallel

  • Uses efficient communication backend


29. Explain the difference between tracing and scripting in TorchScript.

Type When Used Limitation
Tracing Fixed control flow (CNNs) Fails with loops, if-statements
Scripting Dynamic models Slower to compile

30. How does PyTorch manage memory on GPUs?

Mechanisms:

  • Caching allocator

  • Asynchronous execution

  • Gradient buffer reuse

Common error:

CUDA out of memory

Fix:

  • Reduce batch size

  • Use mixed precision

  • Empty cache:

torch.cuda.empty_cache()

Section D: Real-World Scenario Questions


31. Your loss is not decreasing. How do you debug?

Checklist:

  • Check learning rate

  • Inspect preprocessing

  • Check data-label alignment

  • Visualize gradient norms

  • Overfit on a tiny batch


32. How to freeze layers in transfer learning?

for param in model.features.parameters():
    param.requires_grad = False

33. Your model is overfitting. What do you do?

Solutions:

  • Increase dropout

  • Data augmentation

  • Use weight decay

  • Early stopping


34. How do you deploy a PyTorch model?

Options:

  • TorchScript

  • ONNX → TensorRT

  • PyTorch Mobile

  • FastAPI/Flask REST API


35. How do you profile a PyTorch model?

with torch.profiler.profile() as prof:
    output = model(x)

print(prof.key_averages().table(sort_by="cpu_time_total"))

Section E: Coding Challenges (With Expected Answers)


Challenge 1: Write a PyTorch code to compute gradients of a simple function.

x = torch.tensor(5.0, requires_grad=True)
y = x**2 + 3*x + 1
y.backward()
print(x.grad)   # Expected: 2x + 3 = 13

Challenge 2: Define a simple feed-forward neural network.

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Sequential(
            nn.Linear(784, 256),
            nn.ReLU(),
            nn.Linear(256, 10)
        )

    def forward(self, x):
        return self.fc(x)

Challenge 3: Build a custom DataLoader for images.

class ImageDataset(Dataset):
    def __init__(self, image_paths, transform=None):
        self.paths = image_paths
        self.transform = transform

    def __getitem__(self, idx):
        img = Image.open(self.paths[idx]).convert("RGB")
        if self.transform:
            img = self.transform(img)
        return img

    def __len__(self):
        return len(self.paths)

Conclusion

This Special Annexure 1 provides a complete interview-ready resource covering:

  • Fundamental to advanced PyTorch questions

  • Real-world debugging scenarios

  • Deployment and optimization

  • Coding challenges

This is suitable for:

  • Students

  • Researchers

  • ML engineers

  • Candidates preparing for interviews

  • Trainers and educators



Comments