Karl Stratos

Home
People
Papers
Technical Notes
Teaching
Sketches
Drawings
Opinions

Adaptive learning methods
Lagrangian relaxation
Efficient attention
Randomized algorithms
Position embedding
Speculative decoding
Numerical precision for deep learning
Vision architectures
Information theory
The Poisson distribution
The Gaussian distribution from scratch
Multi-armed bandits
Sequence alignment
Diffusion models
Linear discriminant analysis
Writing technical notes
Learning to rank
Indexes for efficient search
The Monty Hall problem
Statistical significance testing from scratch
Boosting as coordinate descent
AdaBoost
Pegasos
Fisher information and policy gradient methods
The alias method
Useful facts about latent-variable generative models
Invariant risk minimization (IRM)
The ON-LSTM and PRPN architectures
Noise contrastive estimation (NCE)
Generalized birthday paradox
Smart pointers in C++11
Notes on Concentration Inequalities: variance analysis
Notes on Concentration Inequalities: Chernoff
Best-match subspaces
Deep CCA
Variable elimination and belief propagation in graphical models
Backpropagation [code]
Variational and information theoretic methods
The Transformer architecture
Local algorithms for interactive clustering
Clustering with interactive feedback
Learning with feature feedback
Generalized topic modeling
Constrained optimization (KKT)
Transition-based dependency parsing (older note)
Graph-based dependency parsing
Descent methods
PAC learnability
k-means clustering
Spectral framework (part I of my dissertation)
Hoeffding, Azuma, McDiarmid bounds
Online convex optimization
Feedforward and recurrent neural networks
The Frank-Wolfe algorithm
Projections onto linear subspaces
The expectation-maximization (EM) algorithm
Ando and Zhang (2005)
Max margin training (aka. support vector machines)
Conditional random fields (CRFs)
Approximate CCA
A Hitchhiker's Guide to PCA and CCA
The Lorentz transformation
Inside-outside algorithm for PCFGs
Forward-backward algorithm for HMMs

© 2010–2025 Karl Stratos