CS 533: Natural Language Processing (Spring 2023)
: Karl Stratos
: Tuesday 2-5pm at FBO-EHA
: None. The course will use self-contained slides/lecture notes and free online resources.
: All lectures, assignments, and projects will be managed on the course Canvas page
This graduate-level course will cover technical foundations of modern natural language processing (NLP).
The course will cast NLP as an application of machine learning, in particular deep learning,
and focus on deriving general scientific and engineering principles that underlie state-of-the-art NLP systems today.
- Understanding the goals, capabilities, and principles of NLP
- Acquiring mathematical tools to formalize NLP problems
- Acquiring implementation skills to build practical NLP systems
- Obtaining an ability to critically read and accurately evaluate conference papers in NLP
No previous exposure to deep learning or NLP is assumed.
However, the course will be most beneficial for students with some programming experience and familiarity with basic concepts in probability and statistics, calculus, and linear algebra.
Examples of such concepts include
- Random variables (continuous or discrete), expectation, mean/variance
- Matrix and vector operations
- Derivatives, partial derivatives, gradients
- Programming (in Python): familiarity with data structures and algorithms
If you are an undergraduate, you must meet the requirements described in the
CS Honors Program
and submit a request form.
I will not be approving requests or giving out special permission numbers until it is closer to the beginning of the semester.
The prerequisites are as follows:
- Required: M250 (linear algebra), 112 (data structures), 206 (discrete II)
- Recommended: M251 (multivariable calculus), 533 (machine learning)
- Alternatives to 206: M477 (probability), S379 (basic probability theory), or instructor's permission
- Entrance quiz: 5%
- Assignments: 50%
- Quizzes: 15%
- Project: 30%
There will be an entrance quiz in the first class. It will help you assess if you have a suitable technical background.
Assignments are the heart of this course.
There will be around 4 assignments. Each assignment will have both written and programming components.
For the written component, you are required to use LaTeX to write up your solutions.
If you have never used LaTeX before, you can pick it up quickly (tutorial
, style guide
For the programming component, you will implement and run your code online using the Jupyter Notebook
on Google Colab
While this setup is slightly detached from the real-world setting (i.e., GitHub repositories),
it allows everyone to get started right away in a uniform software environment.
There will be 3 quizzes throughout the semester each counting 5% of the grade.
In the later part of the semester, you will work on a course project in conjunction with the usual coursework.
The course will have provided basic knowledge to understand the current research landscape of the field.
You will select a recent paper (from a list prepared by the instructor, or by special permission), replicate and possibly build on its results,
present the work in class, and submit a final report.
The project will be due at the end of the semester and graded as follows:
Academic integrity policy.
- 5-minute presentation (10% of the grade): You must strictly adhere to the
paper-writing tips by Jennifer Widom
and clearly explain (1) what the problem is, (2) why it is interesting and important,
(3) why it is hard, (4) why previous approaches fail, and (5) the key components of the paper's/your approach and results.
- 4-page report (20% of the grade): The report will be written and evaluated like a conference paper.
Assignments: Collaboration is allowed and encouraged, as long as you (1) write your own solution entirely on your own, and (2) specify names of student(s) you collaborated with in your writeup.
If you find a solution online, clearly acknowledge the source and still write your own solution entirely on your own.
Copying solutions from others or from the internet is strictly prohibited.
- Quizzes: Cheating is strictly prohibited.
- Project: Collaboration up to 3 is allowed.
If the student is caught in cheating/plagiarism, the incident will be reported to the office of student conduct and he/she will get zero point for the assignment/quiz, which will result in a low final grade.
We will first cover fundamentals of deep learning, with a special emphasis on
- The universality of neural networks
- Cross-entropy loss and gradient-based optimization
- The transformer architecture
Then, we will apply these fundamentals to NLP tasks, especially focusing on the topics of
- Pretrained language models (aka. "foundation models")
Most NLP tasks can be approached by applying pretrained language models and retrievers, including: all simple text classification tasks (e.g., sentiment analysis),
machine translation, summarization, entity linking, coreference resolution.
Additional topics include
- Latent-variable models
- Structured prediction problems (tagging, parsing)
|Week 1 (Jan 17)
||General introduction, text classification, cross-entropy loss
|Week 2 (Jan 24)
||Stochastic gradient descent, regularization, introduction to deep learning
|Week 3 (Jan 31)
||Deep learning continued, backpropagation
|Week 4 (Feb 7)
||Neural architectures for sequences: convolutional, recurrent, transformer
|Week 5 (Feb 14)
||Language models, sequence-to-sequence models, machine translation
|Week 6 (Feb 21)
||Pretrained language models, masked language modeling
|Week 7 (Feb 28)
||Retrieval from a knowledge base, noise contrastive estimation, entity retrieval
|Week 8 (Mar 7)
||Knowledge-intensive language tasks, question answering
|Week 9 (Mar 21)
||Latent-variable models, variational autoencoder
|Week 10 (Mar 28)
||Structured prediction, dynamic programming algorithms for tagging/parsing
|Week 11 (Apr 4)
||Special topics: TBD (project proposal due)
|Week 12 (Apr 11)
||Special topics: TBD
|Week 13 (Apr 18)
||Special topics: TBD
|Week 14 (Apr 25)
- Natural Language Processing by Jacob Eisenstein
- A Primer on Neural Network Models for Natural Language Processing by Yoav Goldberg