In NN03 we built a flexible MLP from scratch with NumPy. Now we’ll rebuild the same network in PyTorch so future chapters (CNNs, RNNs, LSTMs) can reuse a standard training loop and familiar building blocks.
Learning goals¶
By the end of this chapter you will be able to:
Define a flexible MLP using
nn.Moduleandnn.SequentialUse
CrossEntropyLossfor multi-class classificationTrain with a simple PyTorch training loop
Recognize how this maps directly to CNNs/RNNs/LSTMs later
1) The same architecture, now in PyTorch¶
The pattern is identical to our NumPy version: build a list of layers and wrap them in a module.
import torch
import torch.nn as nn
class MLP(nn.Module):
def __init__(self, layer_sizes):
super().__init__()
layers = []
for i in range(len(layer_sizes) - 1):
layers.append(nn.Linear(layer_sizes[i], layer_sizes[i + 1]))
if i < len(layer_sizes) - 2:
layers.append(nn.ReLU())
self.net = nn.Sequential(*layers)
def forward(self, x):
return self.net(x)2) Data: reuse the edge detector dataset¶
We can reuse the same NumPy dataset from NN03, then convert to tensors.
import numpy as np
X_train, y_train = make_data(200)
X_test, y_test = make_data(50)
xb = torch.tensor(X_train, dtype=torch.float32)
yb = torch.tensor(y_train.argmax(axis=1), dtype=torch.long)3) Training loop (standard PyTorch)¶
model = MLP([25, 16, 8, 2])
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
criterion = nn.CrossEntropyLoss()
for epoch in range(200):
logits = model(xb)
loss = criterion(logits, yb)
optimizer.zero_grad()
loss.backward()
optimizer.step()4) Why this unlocks future chapters¶
Once we have this PyTorch baseline:
CNNs are just
nn.Conv2d+ poolingRNNs/LSTMs are built-in modules (
nn.RNN,nn.LSTM)Training loops stay the same, so we can focus on architecture intuition
In NN05 we’ll shift to architecture patterns that bridge into transformers.