I love the challenge of distilling a dense concept into its very essence, to make the core ideas crystal clear and accessible (to myself). I habitually write up technical notes to remember what I learn.
Disclaimer: These notes are completely informal and not checked for correctness. I just use them as a quick personal reference.
All of Backpropagation in Two Pages.
Variational and Information Theoretic Methods.
The Transformer Architecture.
Awasthi Et Al. (2014).
Balcan and Blum (2008).
Poulis and Dasgupta (2017).
Blum and Haghtalab (2016).
Constrained optimization (KKT).
Lectures on selected topics in the 2017 spring NLP class at Columbia.
Spectral framework. Covers spectral decomposition, numerical algorithms, perturbation theory, and modern applications. Subsumes some of the previous notes.
Hoeffding, Azuma, McDiarmid bounds.
Online convex optimization.
Feedforward and recurrent neural networks.
The Frank-Wolfe algorithm.
Projections onto linear subspaces.
The expectation-maximization (EM) algorithm.
Ando and Zhang (2005).
Max margin training (aka. support vector machines).
Conditional random fields (CRFs).
Transition-based dependency parsing.
A Hitchhiker's Guide to PCA and CCA.
The Lorentz transformation.
Inside-outside algorithm for PCFGs.
Forward-backward algorithm for HMMs.