Every transformation visualized — from raw text to trained weights
words = open('names.txt').read().splitlines()
The file is one name per line. Read the whole file, split on newlines.
First 20 words:
chars = sorted(set(''.join(words)))
stoi = {ch: i for i, ch in enumerate(chars)}
stoi['.'] = 26
Extract unique characters, sort, assign each an integer. . is the boundary token (index 26).
Index mapping — every character ↔ integer:
chars = ['.'] + list(word) + ['.']
Wrap each word with . so the model learns which letters start/end names.
Original:
↓ prepend "." + append "."
With boundaries:
for x, y in zip(chars, chars[1:]):
xs.append(stoi[x])
ys.append(stoi[y])
Slide a 2-character window across the list. Each position yields one (input, target) pair.
Sliding window over "emma":
All extracted pairs:
Look up each character in the index map from Step 2.
| # | Input | → | Index | Target | → | Index |
|---|
x_encoded = F.one_hot(xs, num_classes=27).float()
Each integer becomes a 27-element vector: all zeros except a 1 at that character's position.
Each input → one-hot row (green = activated bit):
Trace the math for one pair through the full forward pass.
The one-hot vector picks row ? out of W. That row = the logits.
Exponentiate every logit → all positive. Negative logit = small count, positive = large.
Normalize so they sum to 1. This is the softmax operation.
Probability distribution (target = green, top 12 shown):
How surprised is the model? Lower probability → higher loss.
W.grad = None loss.backward() W.data += -0.001 * W.grad
Compute how much each of the 729 weights influenced the loss, then nudge each one to reduce it.
Repeat steps 7–8 for 150 epochs over all examples.