UCSC · CSE 143Work in Progress

Natural Language Processing

Notes covering tokenisation, language models, sequence-to-sequence architectures, attention, and transformers.

Notes from CSE 143 at UCSC.

Topics Covered

Tokenisation and text preprocessing
N-gram language models
Recurrent neural networks (RNNs, LSTMs)
Attention mechanisms
Transformers and BERT

Useful Links

Stanford CS224N — great companion lectures
The Illustrated Transformer — visual explainer

Textbook

Speech and Language Processing — Jurafsky & Martin