CS388/AI388/DSC395T: Natural Language Processing (online MS)

These are the course materials for an online masters course in NLP. All lectures are videos available on YouTube. If you are enrolled in on-campus CS388 or CS371N, this is not the correct website for your course.

Note on enrollment for on-campus students: These courses are listed in the course catalog as "Natural Language Processing-WB". They are partially asynchronous courses taught for certain online masters programs at UT ("Option III" programs, as the university calls them). If you are a student enrolled on-campus at UT Austin, you are not eligible to take this course. This is a hard requirement from the university due to the fact that this course is part of an Option III program. There is an on-campus version of CS388 that is typically taught once per year which you are eligible to take (or CS371N if you're an undergraduate student). Regardless, you are free to consult the materials here!

Assignments

Assignment 1: Linear Sentiment Classification [code and dataset download] [see Canvas for code walkthrough and debugging tips]

Assignment 2: Feedforward Neural Networks, Word Embeddings, and Generalization [code and dataset download] [see Canvas for code walkthrough and debugging tips]

Assignment 3: Transformer Language Modeling [code and dataset download] [see Canvas for code walkthrough and debugging tips]

[NEW 2024 VERSION] Assignment 4: Factuality and ChatGPT [code and dataset download]

Final Project: Dataset Artifacts [code and dataset download] [example 1] [example 2] [peer assessment instructions]

Midterm: Midterm topics, 2023 midterm / 2023 solutions, 2022 midterm / 2022 solutions

Lecture Videos and Readings

YouTube playlist containing all videos

Download the slides and handwritten notes here (88MB tgz)

Topics and Videos	Readings
Week 1: Intro and Linear Classification
Course Preview
Introduction	Note: this introduction video is from an older run of the class and references an outdated schedule. Please refer to the new course structure here.
Linear Binary Classification	Eisenstein 2.0-2.5, 4.2-4.4.1 Perceptron and logistic regression
Sentiment Analysis and Basic Feature Extraction	Eisenstein 4.1
Basics of Learning, Gradient Descent
Perceptron
Perceptron as Minimizing Loss
Logistic Regression	Perceptron and LR connections
Sentiment Analysis	Thumbs up? Sentiment Classification using Machine Learning Techniques Bo Pang et al., 2002 Baselines and Bigrams: Simple, Good Sentiment and Topic Classification Sida Wang and Christopher Manning, 2012 Convolutional Neural Networks for Sentence Classification Yoon Kim, 2014 [GitHub] NLP Progress on Sentiment Analysis
Optimization Basics
Week 2: Multiclass and Neural Classification
Multiclass Classification	Eisenstein 4.2 Multiclass lecture note
Multiclass Perceptron and Logistic Regression
Multiclass Classification Examples	A large annotated corpus for learning natural language inference Sam Bowman et al., 2015 Authorship Attribution of Micro-Messages Roy Schwartz et al., 2013
Fairness in Classification	50 Years of Test (Un)fairness: Lessons for Machine Learning Ben Hutchinson and Margaret Mitchell, 2018 [Article] Amazon scraps secret AI recruiting tool that showed bias against women
Neural Networks
Neural Network Visualization	[Blog] Neural Networks, Manifolds, and Topology Chris Olah
Feedforward Neural Networks, Backpropagation	Eisenstein Chapter 3.1-3.3
Neural Net Implementation
Neural Net Training, Optimization	Dropout: a simple way to prevent neural networks from overfitting Nitish Srivastava et al., 2014 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Sergey Ioffe and Christian Szegedy, 2015 Adam: A Method for Stochastic Optimization Durk Kingma and Jimmy Ba, 2015 The Marginal Value of Adaptive Gradient Methods in Machine Learning Ashia Wilson et al., 2017
Week 3: Word Embeddings
Word Embeddings	Eisenstein 14.5
Skip-gram	Distributed Representations of Words and Phrases and their Compositionality Tomas Mikolov et al., 2013
Other Word Embedding Methods	A Scalable Hierarchical Distributed Language Model Andriy Mnih and Geoff Hinton, 2008 Neural Word Embedding as Implicit Matrix Factorization Omer Levy and Yoav Goldberg, 2014 GloVe: Global Vectors for Word Representation Jeffrey Pennington et al., 2014 Enriching Word Vectors with Subword Information Piotr Bojanowski et al., 2016
Bias in Word Embeddings	Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings Tolga Bolukbasi et al., 2016 Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings Thomas Manzini et al., 2019 Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them Hila Gonen and Yoav Goldberg, 2019
Applying Embeddings, Deep Averaging Networks	Deep Unordered Composition Rivals Syntactic Methods for Text Classification Mohit Iyyer et al., 2015
Week 4: Language Modeling and Self-Attention
n-gram LMs	Eisenstein 6.1
Smoothing in n-gram LMs	Eisenstein 6.2
LM Evaluation	Eisenstein 6.4
Neural Language Models
RNNs and their Shortcomings	Eisenstein 6.3 [Blog] Understanding LSTMs Chris Olah
Attention	Neural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau et al., 2015
Self-Attention	Attention Is All You Need Ashish Vaswani et al., 2017
Multi-Head Self-Attention	Attention Is All You Need Ashish Vaswani et al., 2017 [Blog] The Illustrated Transformer Jay Alammar
Position Encodings	Attention Is All You Need Ashish Vaswani et al., 2017 Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press et al., 2021 The Impact of Positional Encoding on Length Generalization in Transformers Amirhossein Kazemnejad et al., 2023
Week 5: Transformers and Decoding
Transformer Architecture	Attention Is All You Need Ashish Vaswani et al., 2017
Using Transformers
Transformer Language Modeling
Transformer Extensions	Scaling Laws for Neural Language Models Jared Kaplan et al., 2020 Efficient Transformers: A Survey Yi Tay et al., 2020 Rethinking Attention with Performers Krzysztof Choromanski et al., 2021 Longformer: The Long-Document Transformer Iz Beltagy et al., 2021
Beam Search
Nucleus Sampling	The Curious Case of Neural Text Degeneration Ari Holtzman et al., 2019
Week 6: Pre-training, seq2seq LMs
BERT: Masked Language Modeling	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin et al., 2019
BERT: Model and Applications	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin et al., 2019 To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks Matthew Peters et al., 2019 GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Wang et al., 2019 What Does BERT Look At? An Analysis of BERT's Attention Kevin Clark et al., 2019 RoBERTa: A Robustly Optimized BERT Pretraining Approach Yinhan Liu et al., 2019
Seq2seq Models
BART	BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension Mike Lewis et al., 2019
T5	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Colin Raffel et al., 2020 UnifiedQA: Crossing Format Boundaries With a Single QA System Daniel Khashabi et al., 2020
Word Piece and Byte Pair Encoding	Neural Machine Translation of Rare Words with Subword Units Rico Sennrich et al., 2016 Byte Pair Encoding is Suboptimal for Language Model Pretraining Kaj Bostrom and Greg Durrett, 2020
Week 7-8: Structured Prediction: Part-of-speech, Syntactic Parsing Note: this unit was previously presented as Week 4 right after classification. There are a few references to it being our first brush with structured models. In this structure of the course, it's still true that it's our first exposure to models dealing with linguistic structure as opposed to surface-level sequential structure (i.e., token sequences in generation).
Part-of-Speech Tagging	Eisenstein 8.1
Sequence Labeling, Tagging with Classifiers	Eisenstein 7.1
Hidden Markov Models	Eisenstein 7.4
HMMs: Parameter Estimation	Eisenstein 7.4.1
HMMs: Viterbi Algorithm	Eisenstein 7.3
HMMs for POS Tagging	TnT - A Statistical Part-of-Speech Tagger Thorsten Brants, 2000 Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger Kristina Toutanvoa and Christopher Manning, 2000 Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? Christopher Manning, 2011 Natural Language Processing with Small Feed-Forward Networks Jan Botha et al., 2017
Constituency Parsing	Eisenstein 10.1-10.2
Probabilistic Context-Free Grammars	Eisenstein 10.3-10.4
CKY Algorithm	Eisenstein 10.3.1
Refining Grammars	Accurate Unlexicalized Parsing Dan Klein and Chris Manning, 2003 Eisenstein 10.5
Dependencies	Eisenstein 11.1 Finding Optimal 1-Endpoint-Crossing Trees Emily Pitler et al., 2013
Transition-based Dependency Parsing	Eisenstein 11.3
Week 9: Modern Large Language Models
GPT-3	Language Models are Unsupervised Multitask Learners Alec Radford et al., 2019 Language Models are Few-Shot Learners Tom B. Brown et al., 2020 Llama 2: Open Foundation and Fine-Tuned Chat Models Hugo Touvron et al., 2023 Llama 2 is one of the latest models with publicly available weights (although it is not fully open-source, as many details of the training are not public).
Zero-shot Prompting	Demystifying Prompts in Language Models via Perplexity Estimation Hila Gonen et al., 2022
Few-shot Prompting	Calibrate Before Use: Improving Few-Shot Performance of Language Models Tony Z. Zhao et al., 2021 Holistic Evaluation of Language Models Percy Liang et al., 2022 Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? Sewon Min et al., 2022
Understanding ICL: Induction Heads	In-context Learning and Induction Heads Catherine Olsson et al., 2022
Instruction Tuning	Multitask Prompted Training Enables Zero-Shot Task Generalization Victor Sanh et al., 2021 Scaling Instruction-Finetuned Language Models Hyung Won Chung et al., 2022
Reinforcement Learning from Human Feedback (RLHF)	Training language models to follow instructions with human feedback Long Ouyang et al., 2022 [Website] Stanford Alpaca: An Instruction-following LLaMA Model Rohan Taori et al., 2023
Factuality of LLMs	Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation Yixin Liu et al., 2023 WiCE: Real-World Entailment for Claims in Wikipedia Ryo Kamoi et al., 2023 SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization Philippe Laban et al., 2022 FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation Sewon Min et al., 2023 RARR: Researching and Revising What Language Models Say, Using Language Models Luyu Gao et al., 2022
Week 10: Explanations
Explainability in NLP	The Mythos of Model Interpretability Zach Lipton, 2016 Deep Unordered Composition Rivals Syntactic Methods for Text Classification Mohit Iyyer et al., 2015 Analysis Methods in Neural Language Processing: A Survey Yonatan Belinkov and Jim Glass, 2019
Local Explanations: Highlights	"Why Should I Trust You?" Explaining the Predictions of Any Classifier Marco Tulio Ribeiro et al., 2016 Axiomatic Attribution for Deep Networks Mukund Sundararajan et al., 2017
Model Probing	BERT Rediscovers the Classical NLP Pipeline Ian Tenney et al., 2019 What Do You Learn From Context? Probing For Sentence Structure In Contextualized Word Represenations Ian Tenney et al., 2019
Annotation Artifacts	Annotation Artifacts in Natural Language Inference Data Suchin Gururangan et al., 2018 Hypothesis Only Baselines in Natural Language Inference Adam Poliak et al., 2018 Did the Model Understand the Question? Pramod Kaushik Mudrakarta et al., 2018 Swag: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference Rowan Zellers et al., 2018
Text Explanations	Generating Visual Explanations Lisa-Anne Hendricks et al., 2016 e-SNLI: Natural Language Inference with Natural Language Explanations Oana-Maria Camburu et al., 2018 Explaining Question Answering Models through Text Generation Veronica Latcinnik and Jonathan Berant, 2020
Chain-of-thought	Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems Wang Ling et al., 2017 Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason Wei et al., 2022 The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning Xi Ye and Greg Durrett, 2022 Large Language Models are Zero-Shot Reasoners Takeshi Kojima et al., 2022
Chain-of-thought: Extensions and Analysis	Complementary Explanations for Effective In-Context Learning Xi Ye et al., 2023 PAL: Program-aided Language Models Luyu Gao et al., 2022 Measuring and Narrowing the Compositionality Gap in Language Models Ofir Press et al., 2022
Week 11: Question Answering, Dialogue Systems
Reading comprehension intro
Reading comprehension: setup and baselines	MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text Matthew Richardson et al., 2013 SQuAD: 100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar et al., 2016
BERT for QA
Problems with Reading Comprehension	Adversarial Examples for Evaluating Reading Comprehension Systems Robin Jia and Percy Liang, 2017
Open-domain QA	Reading Wikipedia to Answer Open-Domain Questions Danqi Chen et al., 2017 Latent Retrieval for Weakly Supervised Open Domain Question Answering Kenton Lee et al., 2019 [Website] Natural Questions Tom Kwiatkowski et al., 2019 Most modern open-domain QA systems are either "closed-book" models like ChatGPT or "open-book" models that do retrieval, similar to the Chen et al. and Lee et al. papers above. These are typically described under the general framework of retrieval-augmented generation and an example of how these systems work is WebGPT (similar to the "new Bing" chatbot).
Multi-hop QA	HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering Zhilin Yang et al., 2018 Understanding Dataset Design Choices for Multi-hop Reasoning Jifan Chen and Greg Durrett, 2019 Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering Akari Asai et al., 2020 Modern QA systems operating over the web are largely multi-hop by default; multi-hop QA has been subsumed by open-domain QA to a large extent. For a more recent multi-hop QA dataset, see QAMPARI
Dialogue: Chatbots
Task-Oriented Dialogue	Wizards of Wikipedia: Knowledge-Powered Conversational Agents Emily Dinan et al., 2019 Task-Oriented Dialogue as Dataflow Synthesis Semantic Machines, 2020
Neural Chatbots	A Neural Network Approach to Context-Sensitive Generation of Conversational Responses Alessandro Sordoni et al., 2015 A Diversity-Promoting Objective Function for Neural Conversation Models Jiwei Li et al., 2016 Recipes for building an open-domain chatbot Stephen Roller et al., 2020 Note: an updated version of BlenderBot is described in Kurt Shuster et al.. Other chatbots discussed, like character.ai, can be found online and you can play with them, but less information about their precise internals is available in published papers.
Week 12: Machine Translation, Summarization
Machine Translation Intro	Eisenstein 18.1
MT: Framework and Evaluation	Eisenstein 18.1
MT: Word alignment
MT: IBM Models	HMM-Based Word Alignment in Statistical Translation Stephan Vogel et al., 1996
Phrase-based Machine Translation	Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models Philipp Koehn, 2004 Minimum Error Rate Training in Statistical Machine Translation Franz Och, 2003 Eisenstein 18.4
Neural and Pre-Trained Machine Translation	Revisiting Low-Resource Neural Machine Translation: A Case Study Rico Sennrich and Biao Zhang, 2019 In Neural Machine Translation, What Does Transfer Learning Transfer? Alham Fikri Aji et al., 2020 Multilingual Denoising Pre-training for Neural Machine Translation Yinhan Liu et al., 2020 Large Language Models Are State-of-the-Art Evaluators of Translation Quality Tom Kocmi and Christian Federmann, 2023
Summarization Intro
Extractive Summarization	The use of MMR, diversity-based reranking for reordering documents and producing summaries Jaime Carbonell and Jade Goldstein, 1998 LexRank: Graph-based Lexical Centrality as Salience in Text Summarization Gunes Erkan and Dragomir Radev, 2004 A Scalable Global Model for Summarization Dan Gillick and Benoit Favre, 2009 Revisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization Demian Gholipour Ghalandari, 2017
Pre-trained Summarization and Factuality	BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension Mike Lewis et al., 2019 PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization Jingqing Zhang et al., 2020 Evaluating Factuality in Generation with Dependency-level Entailment Tanya Goyal and Greg Durrett, 2020 Asking and Answering Questions to Evaluate the Factual Consistency of Summaries Alex Wang et al., 2020 Note: while the specific fine-tuned modeling approaches and factuality detection systems are no longer state-of-the-art as stated in the video, they are representative of ideas from pre-training that are still used today. For discussion of how LLMs relate to summarization, see News Summarization and Evaluation in the Era of GPT-3 by Tanya Goyal, Junyi Jessy Li, and Greg Durrett
Week 13-14: Multilinguality, Language Grounding, Ethical Issues
Morphology
Cross-lingual Tagging and Parsing	Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections Dipanjan Das and Slav Petrov, 2011 Multi-Source Transfer of Delexicalized Dependency Parsers Ryan McDonald et al., 2011
Cross-lingual Pre-training	Massively Multilingual Word Embeddings Waleed Ammar et al., 2016 Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond Mikel Artetxe and Holger Schwenk, 2019 How multilingual is Multilingual BERT? Telmo Pires et al., 2019
Language Grounding	Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data Emily Bender and Alexander Koller, 2020 Provable Limitations of Acquiring Meaning from Ungrounded Form: What Will Future Language Models Understand? Will Merrill et al., 2021 Entailment Semantics Can Be Extracted from an Ideal Language Model Will Merrill et al., 2022 Experience Grounds Language Yonatan Bisk et al., 2020
Language and Vision	VQA: Visual Question Answering Aishwarya Agrawal et al., 2015 Learning Transferable Visual Models From Natural Language Supervision Alex Radford et al., 2021
Ethics: Bias	The Social Impact of Natural Language Processing Dirk Hovy and Shannon Spruit, 2016 Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints Jieyu Zhao et al., 2017
Ethics: Exclusion	GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models Da Yin et al., 2022 Visually Grounded Reasoning across Languages and Cultures Fangyu Liu et al., 2021
Ethics: Dangers of Automation	On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Emily Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell, 2021 RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models Samuel Gehman et al., 2020
Ethics: Unethical Use and Paths Forward	Datasheets for Datasets Timnit Gebru et al., 2018 Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing Deb Raji et al., 2020