CS388: Natural Language Processing (online MS version)

These are the course materials for an online masters course in NLP. All lectures are videos available on YouTube.

Note on enrollment for on-campus students: This course is listed in the course catalog as "Natural Language Processing-WB". It is a partially asynchronous course taught for certain online masters programs at UT ("Option III" programs, as the university calls them). If you are a student enrolled on-campus at UT Austin, you are not eligible to take this course. This is a hard requirement from the university due to the fact that this course is part of an Option III program. There is an on-campus version of CS388 that is typically taught once per year by either me, Eunsol Choi, or Ray Mooney, which you are eligible to take (or CS371N if you're an undergraduate student). Regardless, you are free to consult the materials here!

Assignments

Assignment 1: Linear Sentiment Classification [code and dataset download] [see edX for code walkthrough and debugging tips]

Assignment 2: Feedforward Neural Networks, Word Embeddings, and Generalization [code and dataset download] [see edX for code walkthrough and debugging tips]

Assignment 3: Transformer Language Modeling [code and dataset download] [see edX for code walkthrough and debugging tips]

Assignment 4: Factuality and ChatGPT [code and dataset download]

Final Project: Dataset Artifacts [code and dataset download] [example 1] [example 2] [peer assessment instructions]

Lecture Videos and Readings

YouTube playlist containing all videos

Download the slides and handwritten notes here (88MB tgz)

Topics and Videos Readings
Week 1: Intro and Linear Classification
Course Preview
Introduction
Note: this introduction video is from an older run of the class and references an outdated schedule. Please refer to the new course structure here.
Linear Binary Classification Eisenstein 2.0-2.5, 4.2-4.4.1

Perceptron and logistic regression
Sentiment Analysis and Basic Feature Extraction Eisenstein 4.1
Basics of Learning, Gradient Descent
Perceptron
Perceptron as Minimizing Loss
Logistic Regression Perceptron and LR connections
Sentiment Analysis Thumbs up? Sentiment Classification using Machine Learning Techniques Bo Pang et al., 2002

Baselines and Bigrams: Simple, Good Sentiment and Topic Classification Sida Wang and Christopher Manning, 2012

Convolutional Neural Networks for Sentence Classification Yoon Kim, 2014

[GitHub] NLP Progress on Sentiment Analysis
Optimization Basics
Week 2: Multiclass and Neural Classification
Multiclass Classification Eisenstein 4.2

Multiclass lecture note
Multiclass Perceptron and Logistic Regression
Multiclass Classification Examples A large annotated corpus for learning natural language inference Sam Bowman et al., 2015

Authorship Attribution of Micro-Messages Roy Schwartz et al., 2013
Fairness in Classification 50 Years of Test (Un)fairness: Lessons for Machine Learning Ben Hutchinson and Margaret Mitchell, 2018

[Article] Amazon scraps secret AI recruiting tool that showed bias against women
Neural Networks
Neural Network Visualization [Blog] Neural Networks, Manifolds, and Topology Chris Olah
Feedforward Neural Networks, Backpropagation Eisenstein Chapter 3.1-3.3
Neural Net Implementation
Neural Net Training, Optimization Dropout: a simple way to prevent neural networks from overfitting Nitish Srivastava et al., 2014

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Sergey Ioffe and Christian Szegedy, 2015

Adam: A Method for Stochastic Optimization Durk Kingma and Jimmy Ba, 2015

The Marginal Value of Adaptive Gradient Methods in Machine Learning Ashia Wilson et al., 2017
Week 3: Word Embeddings
Word Embeddings
Skip-gram Distributed Representations of Words and Phrases and their Compositionality Tomas Mikolov et al., 2013
Other Word Embedding Methods A Scalable Hierarchical Distributed Language Model Andriy Mnih and Geoff Hinton, 2008

Neural Word Embedding as Implicit Matrix Factorization Omer Levy and Yoav Goldberg, 2014

GloVe: Global Vectors for Word Representation Jeffrey Pennington et al., 2014

Enriching Word Vectors with Subword Information Piotr Bojanowski et al., 2016
Bias in Word Embeddings Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings Tolga Bolukbasi et al., 2016

Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings Thomas Manzini et al., 2019

Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them Hila Gonen and Yoav Goldberg, 2019
Applying Embeddings, Deep Averaging Networks Deep Unordered Composition Rivals Syntactic Methods for Text Classification Mohit Iyyer et al., 2015
Week 4: Language Modeling and Self-Attention
n-gram LMs Eisenstein 6.1
Smoothing in n-gram LMs Eisenstein 6.2
LM Evaluation Eisenstein 6.4
Neural Language Models
RNNs and their Shortcomings Eisenstein 6.3

[Blog] Understanding LSTMs Chris Olah
Attention Neural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau et al., 2015
Self-Attention Attention Is All You Need Ashish Vaswani et al., 2017
Multi-Head Self-Attention Attention Is All You Need Ashish Vaswani et al., 2017

[Blog] The Illustrated Transformer Jay Alammar
Position Encodings Attention Is All You Need Ashish Vaswani et al., 2017

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press et al., 2021

The Impact of Positional Encoding on Length Generalization in Transformers Amirhossein Kazemnejad et al., 2023
Week 5: Transformers and Decoding
Transformer Architecture Attention Is All You Need Ashish Vaswani et al., 2017
Using Transformers
Transformer Language Modeling
Transformer Extensions Scaling Laws for Neural Language Models Jared Kaplan et al., 2020

Efficient Transformers: A Survey Yi Tay et al., 2020

Rethinking Attention with Performers Krzysztof Choromanski et al., 2021

Longformer: The Long-Document Transformer Iz Beltagy et al., 2021
Beam Search
Nucleus Sampling The Curious Case of Neural Text Degeneration Ari Holtzman et al., 2019
Week 6: Pre-training, seq2seq LMs
BERT: Masked Language Modeling BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin et al., 2019
BERT: Model and Applications BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin et al., 2019

To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks Matthew Peters et al., 2019

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Wang et al., 2019

What Does BERT Look At? An Analysis of BERT's Attention Kevin Clark et al., 2019 RoBERTa: A Robustly Optimized BERT Pretraining Approach Yinhan Liu et al., 2019
Seq2seq Models
BART BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension Mike Lewis et al., 2019
T5 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Colin Raffel et al., 2020

UnifiedQA: Crossing Format Boundaries With a Single QA System Daniel Khashabi et al., 2020

Word Piece and Byte Pair Encoding Neural Machine Translation of Rare Words with Subword Units Rico Sennrich et al., 2016

Byte Pair Encoding is Suboptimal for Language Model Pretraining Kaj Bostrom and Greg Durrett, 2020
Week 7-8: Structured Prediction: Part-of-speech, Syntactic Parsing
Note: this unit was previously presented as Week 4 right after classification. There are a few references to it being our first brush with structured models. In this structure of the course, it's still true that it's our first exposure to models dealing with linguistic structure as opposed to surface-level sequential structure (i.e., token sequences in generation).
Part-of-Speech Tagging Eisenstein 8.1
Sequence Labeling, Tagging with Classifiers Eisenstein 7.1
Hidden Markov Models Eisenstein 7.4
HMMs: Parameter Estimation Eisenstein 7.4.1
HMMs: Viterbi Algorithm Eisenstein 7.3
HMMs for POS Tagging TnT - A Statistical Part-of-Speech Tagger Thorsten Brants, 2000

Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger Kristina Toutanvoa and Christopher Manning, 2000

Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? Christopher Manning, 2011

Natural Language Processing with Small Feed-Forward Networks Jan Botha et al., 2017
Constituency Parsing Eisenstein 10.1-10.2
Probabilistic Context-Free Grammars Eisenstein 10.3-10.4
CKY Algorithm Eisenstein 10.3.1
Refining Grammars Accurate Unlexicalized Parsing Dan Klein and Chris Manning, 2003

Eisenstein 10.5
Dependencies Eisenstein 11.1

Finding Optimal 1-Endpoint-Crossing Trees Emily Pitler et al., 2013
Transition-based Dependency Parsing Eisenstein 11.3
Week 9: Modern Large Language Models
GPT-3 Language Models are Unsupervised Multitask Learners Alec Radford et al., 2019

Language Models are Few-Shot Learners Tom B. Brown et al., 2020

Llama 2: Open Foundation and Fine-Tuned Chat Models Hugo Touvron et al., 2023

Llama 2 is one of the latest models with publicly available weights (although it is not fully open-source, as many details of the training are not public).
Zero-shot Prompting Demystifying Prompts in Language Models via Perplexity Estimation Hila Gonen et al., 2022
Few-shot Prompting Calibrate Before Use: Improving Few-Shot Performance of Language Models Tony Z. Zhao et al., 2021

Holistic Evaluation of Language Models Percy Liang et al., 2022

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? Sewon Min et al., 2022
Understanding ICL: Induction Heads In-context Learning and Induction Heads Catherine Olsson et al., 2022
Instruction Tuning Multitask Prompted Training Enables Zero-Shot Task Generalization Victor Sanh et al., 2021

Scaling Instruction-Finetuned Language Models Hyung Won Chung et al., 2022
Reinforcement Learning from Human Feedback (RLHF) Training language models to follow instructions with human feedback Long Ouyang et al., 2022

[Website] Stanford Alpaca: An Instruction-following LLaMA Model Rohan Taori et al., 2023
Factuality of LLMs Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation Yixin Liu et al., 2023

WiCE: Real-World Entailment for Claims in Wikipedia Ryo Kamoi et al., 2023

SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization Philippe Laban et al., 2022

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation Sewon Min et al., 2023

RARR: Researching and Revising What Language Models Say, Using Language Models Luyu Gao et al., 2022
Week 10: Explanations
Explainability in NLP The Mythos of Model Interpretability Zach Lipton, 2016

Deep Unordered Composition Rivals Syntactic Methods for Text Classification Mohit Iyyer et al., 2015

Analysis Methods in Neural Language Processing: A Survey Yonatan Belinkov and Jim Glass, 2019
Local Explanations: Highlights "Why Should I Trust You?" Explaining the Predictions of Any Classifier Marco Tulio Ribeiro et al., 2016

Axiomatic Attribution for Deep Networks Mukund Sundararajan et al., 2017
Model Probing BERT Rediscovers the Classical NLP Pipeline Ian Tenney et al., 2019

What Do You Learn From Context? Probing For Sentence Structure In Contextualized Word Represenations Ian Tenney et al., 2019
Annotation Artifacts Annotation Artifacts in Natural Language Inference Data Suchin Gururangan et al., 2018

Hypothesis Only Baselines in Natural Language Inference Adam Poliak et al., 2018

Did the Model Understand the Question? Pramod Kaushik Mudrakarta et al., 2018

Swag: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference Rowan Zellers et al., 2018
Text Explanations Generating Visual Explanations Lisa-Anne Hendricks et al., 2016

e-SNLI: Natural Language Inference with Natural Language Explanations Oana-Maria Camburu et al., 2018

Explaining Question Answering Models through Text Generation Veronica Latcinnik and Jonathan Berant, 2020
Chain-of-thought Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems Wang Ling et al., 2017

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason Wei et al., 2022

The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning Xi Ye and Greg Durrett, 2022

Large Language Models are Zero-Shot Reasoners Takeshi Kojima et al., 2022
Chain-of-thought: Extensions and Analysis Complementary Explanations for Effective In-Context Learning Xi Ye et al., 2023

PAL: Program-aided Language Models Luyu Gao et al., 2022

Measuring and Narrowing the Compositionality Gap in Language Models Ofir Press et al., 2022
Week 11: Question Answering, Dialogue Systems
Reading comprehension intro
Reading comprehension: setup and baselines MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text Matthew Richardson et al., 2013

SQuAD: 100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar et al., 2016
BERT for QA
Problems with Reading Comprehension Adversarial Examples for Evaluating Reading Comprehension Systems Robin Jia and Percy Liang, 2017
Open-domain QA Reading Wikipedia to Answer Open-Domain Questions Danqi Chen et al., 2017

Latent Retrieval for Weakly Supervised Open Domain Question Answering Kenton Lee et al., 2019

[Website] Natural Questions Tom Kwiatkowski et al., 2019

Most modern open-domain QA systems are either "closed-book" models like ChatGPT or "open-book" models that do retrieval, similar to the Chen et al. and Lee et al. papers above. These are typically described under the general framework of retrieval-augmented generation and an example of how these systems work is WebGPT (similar to the "new Bing" chatbot).
Multi-hop QA HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering Zhilin Yang et al., 2018

Understanding Dataset Design Choices for Multi-hop Reasoning Jifan Chen and Greg Durrett, 2019

Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering Akari Asai et al., 2020

Modern QA systems operating over the web are largely multi-hop by default; multi-hop QA has been subsumed by open-domain QA to a large extent. For a more recent multi-hop QA dataset, see QAMPARI
Dialogue: Chatbots
Task-Oriented Dialogue Wizards of Wikipedia: Knowledge-Powered Conversational Agents Emily Dinan et al., 2019

Task-Oriented Dialogue as Dataflow Synthesis Semantic Machines, 2020
Neural Chatbots A Neural Network Approach to Context-Sensitive Generation of Conversational Responses Alessandro Sordoni et al., 2015

A Diversity-Promoting Objective Function for Neural Conversation Models Jiwei Li et al., 2016

Recipes for building an open-domain chatbot Stephen Roller et al., 2020

Note: an updated version of BlenderBot is described in Kurt Shuster et al.. Other chatbots discussed, like character.ai, can be found online and you can play with them, but less information about their precise internals is available in published papers.
Week 12: Machine Translation, Summarization
Machine Translation Intro Eisenstein 18.1
MT: Framework and Evaluation Eisenstein 18.1
MT: Word alignment
MT: IBM Models HMM-Based Word Alignment in Statistical Translation Stephan Vogel et al., 1996
Phrase-based Machine Translation Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models Philipp Koehn, 2004

Minimum Error Rate Training in Statistical Machine Translation Franz Och, 2003

Eisenstein 18.4
Neural and Pre-Trained Machine Translation Revisiting Low-Resource Neural Machine Translation: A Case Study Rico Sennrich and Biao Zhang, 2019

In Neural Machine Translation, What Does Transfer Learning Transfer? Alham Fikri Aji et al., 2020

Multilingual Denoising Pre-training for Neural Machine Translation Yinhan Liu et al., 2020

Large Language Models Are State-of-the-Art Evaluators of Translation Quality Tom Kocmi and Christian Federmann, 2023
Summarization Intro
Extractive Summarization The use of MMR, diversity-based reranking for reordering documents and producing summaries Jaime Carbonell and Jade Goldstein, 1998

LexRank: Graph-based Lexical Centrality as Salience in Text Summarization Gunes Erkan and Dragomir Radev, 2004

A Scalable Global Model for Summarization Dan Gillick and Benoit Favre, 2009

Revisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization Demian Gholipour Ghalandari, 2017
Pre-trained Summarization and Factuality BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension Mike Lewis et al., 2019

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization Jingqing Zhang et al., 2020

Evaluating Factuality in Generation with Dependency-level Entailment Tanya Goyal and Greg Durrett, 2020

Asking and Answering Questions to Evaluate the Factual Consistency of Summaries Alex Wang et al., 2020

Note: while the specific fine-tuned modeling approaches and factuality detection systems are no longer state-of-the-art as stated in the video, they are representative of ideas from pre-training that are still used today. For discussion of how LLMs relate to summarization, see News Summarization and Evaluation in the Era of GPT-3 by Tanya Goyal, Junyi Jessy Li, and Greg Durrett
Week 13-14: Multilinguality, Language Grounding, Ethical Issues
Morphology
Cross-lingual Tagging and Parsing Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections Dipanjan Das and Slav Petrov, 2011

Multi-Source Transfer of Delexicalized Dependency Parsers Ryan McDonald et al., 2011
Cross-lingual Pre-training Massively Multilingual Word Embeddings Waleed Ammar et al., 2016

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond Mikel Artetxe and Holger Schwenk, 2019

How multilingual is Multilingual BERT? Telmo Pires et al., 2019
Language Grounding Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data Emily Bender and Alexander Koller, 2020

Provable Limitations of Acquiring Meaning from Ungrounded Form: What Will Future Language Models Understand? Will Merrill et al., 2021

Entailment Semantics Can Be Extracted from an Ideal Language Model Will Merrill et al., 2022

Experience Grounds Language Yonatan Bisk et al., 2020
Language and Vision VQA: Visual Question Answering Aishwarya Agrawal et al., 2015

Learning Transferable Visual Models From Natural Language Supervision Alex Radford et al., 2021
Ethics: Bias The Social Impact of Natural Language Processing Dirk Hovy and Shannon Spruit, 2016

Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints Jieyu Zhao et al., 2017
Ethics: Exclusion GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models Da Yin et al., 2022

Visually Grounded Reasoning across Languages and Cultures Fangyu Liu et al., 2021
Ethics: Dangers of Automation On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Emily Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell, 2021

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models Samuel Gehman et al., 2020
Ethics: Unethical Use and Paths Forward Datasheets for Datasets Timnit Gebru et al., 2018

Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing Deb Raji et al., 2020