CS 533: Natural Language Processing (NLP)
Coronavirus update.
All classes after Spring Recess (i.e., from March 25) will be livestreamed within the usual time slot until further notice.
The project presentations will also be livestreamed if inperson delivery is infeasible due to the coronavirus situation.
Please see Canvas for details on how to join online classes.
Instructor: Karl Stratos
TA: Zuohui Fu (office hours: Tuesday 3:304:30pm, Hill 273)
Time and location: Wednesday 123pm at BE 252
Instructor office hours: Wednesday 3:204:30pm at Tillett 111H
Course description.
This projectcentered graduate course will cover technical foundations of modern NLP.
Students are expected to start working on course projects immediately from the beginning of the course and throughout,
culminating in (1) inclass project presentations and (2) written reports that aspire to conference publication level.
The course will have two parts that happen in parallel.
The first part is standard lecturebased classes in which the instructor exposes students to fundamental concepts and applications in the field.
The second part is continual discussions and brainstorming about course projects and selfinitiated research efforts.
There is no required textbook: all materials are publicly available online resources.
Please use the Canvas site to ask questions regarding lectures/homeworks/projects, to submit assignments, and to find announcements.
Goals.
 Achieving an understanding of the foundational concepts and tools used in modern NLP
 Obtaining an ability to critically read and accurately evaluate conference papers in NLP
 Finding new research projects that persist beyond this course
Audience and prerequisites.
No previous exposure to NLP is assumed. However, this is a fastpaced course designed for selfmotivated graduate or advanced undergraduate students with a solid technical background in probability and statistics, calculus, and linear algebra.
Technical requirements include:
 Probabilistic reasoning (e.g., What is the conditional probability of Y=y given X=x, assuming the knowledge of a joint distribution over X and Y?)
 Intimate and intuitive understanding of matrix and vector operations (e.g., What is the shape of a matrix product? How similar are two vectors?)
 Mathematical notions in optimization (e.g., What does it mean for a function to have zero derivative at a certain point?)
If you cannot complete A1 comfortably, you may need to consult with the instructor about whether your background meets the prerequisites.
Significant programming experience in Python is necessary for programming assignments and course projects.
Grading.
 Assignments: 50% (10% per assignment)
 Project: 40% (written report 30%, presentation 10%)
 Participation: 10%
The assignment report must be written in LaTeX using the provided assignment report template.
Similarly, the project report must be written in LaTeX using the provided project report template and will be reviewed by the instructor like a conference submission.
Project timeline.
 Proposal (due 3/24) : submit an initial proposal using this template.
 Milestone (due 4/15): submit an informal 12 page progress report.
 Presentation (tentatively 4/29): inclass presentation
 Final report (due 5/4): submit a final report
Tentative plan.
Date 
Topics 
Readings 
Assignments 
Week 1 (January 22) 
Logistics, Introduction, Language Modeling 
Michael Collins' notes on ngram models and loglinear models

A1 [code] (Due 2/4) 
Week 2 (January 29) 
Deep Learning for NLP: Neural Language Modeling 
Colah's blogs on deep learning and LSTMs,
NLM papers using feedforward (Bengio et al., 2003), recurrent (Mikolov et al., 2010; Melis et al., 2018),
and attentionbased (GPT2) architectures


Week 3 (February 5) 
Deep Learning for NLP: Conditional Neural Language Modeling 
BLEU, inputfeeding attention, Google's NMT,
summarization,
copy mechanism,
datatotext generation

A2 [code] (Due 2/18) 
Week 4 (February 12) 
Deep Learning for NLP: Backpropagation, SelfAttention, Representation Learning by Language Modeling 
Backpropagation, Transformer (note), ELMo, BERT


Week 5 (February 19) 
Structured Prediction in NLP: Tagging 
Michael Collins' notes on
HMMs, CRFs, and forwardbackward,
neural architectures for sequence labeling (Collobert et al., 2011; Lample et al., 2016)

A3 [code] (Due 3/10) 
Week 6 (February 26) 
Structured Prediction in NLP: Constituency and Dependency Parsing 
constituency parsing (Michael Collins' notes on PCFGs and insideoutside algorithm; Kitaev and Klein, 2018),
transitionbased dependency parsing (Nivre, 2008; Chen and Manning, 2014)
graphbased dependency parsing (Eisner, 1996; Kiperwasser and Goldberg, 2016)


Week 7 (March 4) 
Unsupervised Learning in NLP: LatentVariable Generative Models and the EM Algorithm 
David McAllester's notes on EM,
EM for Naive Bayes model (MRS, 2019;
Michael Collins' notes),
EM for PCFGs (Lari and Young, 1990)


Week 8 (March 11) 
Unsupervised Learning in NLP: Autoencoders and VAEs 
Section 1 and Appendix A of this note,
useful facts about latentvariable generative models,
VAEs for NLP (Bowman et al., 2016; Pelsmaeker and Aziz, 2019)

A4 [code] (Due 3/31) 
Spring Recess 



Week 9 (March 25) 
Information Extraction in NLP 
Document weighting schemes (MRS Chapter 6, TFIDF, BM25),
entity linking (Ling et al., 2015; Logeswaran et al., 2019; Gillick et al., 2019; Kolitsas et al., 2018),
retrievalbased question answering (Chen et al., 2017; Lee et al., 2019),
coreference resolution (Lee et al., 2017),
relation extraction (slides by Huck and Fraser)


Week 10 (April 1) 
Special Topics: LargeScale Transfer Learning for NLP 
Exploring the Limits of Transfer Learning with a Unified TexttoText Transformer 
Proposal due 3/31, A5 (Due 4/21) 
Week 11 (April 8) 
Special Topics: Data Annotation and Question Answering 
BREAK It Down: A Question Understanding Benchmark, HOTPOTQA: A Dataset for Diverse, Explainable Multihop Question Answering 

Week 12 (April 15) 
Milestone Presentations 

Milestone due 4/14 
Week 13 (April 22) 
Special Topics: Parallel Decoding, Discriminative vs Generative Models 
MaskPredict: Parallel Decoding of Conditional Masked Language Models, Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One 

Week 14 (April 29) 
Project Presentations 


Other resources.
 Speech and Language Processing (3rd edition) by Dan Jurafsky and James H. Martin
 A Primer on Neural Network Models for Natural Language Processing by Yoav Goldberg
 Natural Language Processing by Jacob Eisenstein
