Karl Stratos: Teaching: Special Topics Course on Spectral Techniques

Course number: TTIC 41000
Instructor: Karl Stratos
Time: M 3-4:20pm
Location: TTIC Room 526
Office hours: M 4:30-5pm

Course description. This special topics course will examine techniques that use eigenvalues/eigenvectors of a matrix (and more generally, any linear algebraic tools) to solve or understand problems in modern machine learning. The course will be accompanied by lectures on technical materials required to understand and derive spectral techniques. As part of the course, students will take turn to present papers on recent research in this area.

Goals.

Achieving an understanding of the foundational concepts and tools used in modern spectral methods
Obtaining an ability to accurately evaluate new works in this area at conferences
Finding new research projects that persist beyond this course

Audience and prerequisites. The expected audience is student researchers in machine learning who are interested in learning about and doing novel research on this topic. The main prerequisite is general mathematical maturity and familiarity with basic notions in linear algebra, statistics, and optimization.

Grading. The course will be pass-fail. Students who actively engage in the class will receive a pass. A significant part of this evaluation will also be the technical quality of the student presentation.

Tentative plan.

Oct 1	Introduction, Vector Space Review
Oct 3 (Wed)	Hilbert Space, Matrix Decomposition (Dudeja and Hsu, 2018)
Oct 8	Spectral Learning of HMMs (Hsu et al., 2008; Foster et al., 2012; Balle et al., 2014)
Oct 15	Canonical Correlation Analysis (Golub and Zha, 1992; Andrew et al., 2013) Some notes on deep CCA Comparing matrix ranges
Oct 22	Paper Discussion (Wang et al., 2015b)
Oct 29	Paper Discussion (Chapter 2 of Peyre and Cuturi)
Nov 5	Paper Discussion (Chapter 4 of Peyre and Cuturi; Peyre et al., 2016; Alvarez-Melis and Jaakkola, 2018; Zhang et al., 2017)
Nov 12	Paper Discussion (Von Luxburg, 2007; Belkin and Niyogi, 2002)
Nov 19	Paper Discussion (Sedoc et al., 2017; Halko et al., 2011)
Nov 26	Paper Discussion (Goodfellow et al., 2014; Arjovsky et al., 2017; Arora et al., 2018; Miyato et al., 2018)
Dec 3	Paper Discussion (Anandkumar et al., 2014; Sedghi et al., 2016)

Papers. Please expand recursively for a more complete list.

Topic	Papers
Efficient algorithms	Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions (Halko et al., 2009) Finding Linear Structure in Large Datasets with Scalable Canonical Correlation Analysis (Ma et al., 2015) Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations (Wang et al., 2015a) Stochastic Approximation for Canonical Correlation Analysis (Arora et al., 2017)
Analysis of CCA	The Canonical Correlations of Matrix Pairs and Their Numerical Computation (Golub and Zha, 1992) A Probabilistic Interpretation of Canonical Correlation Analysis (Bach and Jordan, 2005) Multi-View Regression via Canonical Correlation Analysis (Kakade and Foster, 2007) Multi-View Dimensionality Reduction via Canonical Correlation Analysis (Foster et al., 2008)
Subspace identification	A spectral algorithm for learning mixture models (Vempala and Wang, 2004) On Spectral Learning of Mixtures of Distributions (Achlioptas and McSherry, 2005) A Spectral Algorithm for Learning Hidden Markov Models (Hsu et al., 2008) Toward Learning Gaussian Mixtures with Arbitrary Separation (Belkin and Sinha, 2010) Spectral dimensionality reduction for HMMs (Foster et al., 2012) Learning mixtures of spherical Gaussians: moment methods and spectral decompositions (Hsu and Kakade, 2012) Hilbert Space Embeddings of Predictive State Representations (Boots et al., 2013) Generalized Topic Modeling (Blum and Haghtalab, 2016) Multitask Spectral Learning of Weighted Automata (Rabusseau et al., 2017) Connecting Weighted Automata and Recurrent Neural Networks through Spectral Learning (Rabusseau et al., 2018) Learning Single-Index Models in Gaussian Space (Dudeja and Hsu, 2018)
Tensor decomposition	Tensor Decompositions for Learning Latent Variable Models (Anandkumar et al., 2014) Provable Tensor Methods for Learning Mixtures of Generalized Linear Models (Sedghi et al., 2016)
Optimal transport	Wasserstein GAN (Arjovsky et al., 2017) Gromov-Wasserstein Averaging of Kernel and Distance Matrices (Peyre et al., 2016) Earth Mover's Distance Minimization for Unsupervised Bilingual Lexicon Induction (Zhang et al., 2017) Gromov-Wasserstein Alignment of Word Embedding Spaces (Alvarez-Melis and Jaakkola, 2018)
Neural networks	Deep Canonical Correlation Analysis (Andrew et al., 2013) Neural Word Embedding as Implicit Matrix Factorization (Levy and Goldberg, 2014) On Deep Multi-View Representation Learning (Wang et al., 2015b) Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling (Inan et al., 2016) Breaking the Softmax Bottleneck: A High-Rank RNN Language Model (Yang et al., 2017) Spectral Normalization for Generative Adversarial Networks (Miyato et al., 2018)
Clustering/graph cuts	Normalized Cuts and Image Segmentation (Shi and Malik, 2000) Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering (Belkin and Niyogi, 2002) Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (Zhu et al., 2003) K-means Clustering via Principal Component Analysis (Ding and He 2004) A Tutorial on Spectral Clustering (Von Luxburg, 2007) Semantic Word Clusters Using Signed Spectral Clustering (Sedoc et al., 2017)
Kernel approximation	Using the Nystrom Method to Speed Up Kernel Machines (Williams and Seeger, 2000) Random Features for Large-Scale Kernel Machines (Rahimi and Recht, 2007)
Misc.	A Generalization of Principal Component Analysis to the Exponential Family (Collins et al., 2002) A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data (Ando and Zhang, 2005) A useful variant of the Davis-Kahan theorem for statisticians (Yu et al., 2014) Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path (Hsu et al., 2015) Generative Adversarial Nets (Goodfellow et al., 2014) Do GANs Learn the Distribution? Some Theory and Empirics (Arora et al., 2018)

Other resources.

Textbooks	Linear Algebra (Friedberg et al., 2002) Computational Optimal Transport (Peyre and Cuturi, 2018)
Videos	A Primer on Optimal Transport Linear Algebra: An In-Depth Introduction (Part 1-4)

Special Topics: Spectral Techniques for Machine Learning