CS388: Natural Language Processing (online MS version)

These are the course materials for an online masters course in NLP. To view the videos, you may need to download them if the Google Drive player isn't working well; they're large files and sometimes they can cause trouble when viewing them in the browser. VLC is a good video player that allows you to change video playback speed.

Note on enrollment: if you are a student enrolled on-campus at UT Austin, you are not eligible to take this course. This is a hard requirement from the university due to the fact that this course is part of an Option III MS program. There is an on-campus version of CS388 that is typically taught once per year by either me, Eunsol Choi, or Ray Mooney, which you are eligible to take. Regardless, you are free to consult the materials here!

Assignments

If you are currently enrolled in the class, please carefully note the version of each posted assignment here. Assignments from past semesters are subject to change.

[FALL 2021 VERSION] Assignment 1: Linear Sentiment Classification [code and dataset download]

[FALL 2021 VERSION] Assignment 2: Sentiment with Feedforward Neural Networks [code and dataset download]

[FALL 2021 VERSION] Assignment 3: HMMs and CRFs for NER [code and dataset download]

[FALL 2021 VERSION] Assignment 4: Character Language Modeling with RNNs [code and dataset download]

[FALL 2021 VERSION] Assignment 5: Semantic Parsing with Encoder-Decoder Models [code and dataset download]

[FALL 2021 VERSION] Final Project: Dataset Artifacts [code and dataset download] [example 1] [example 2]

Lectures

Download the slides and handwritten notes here (90MB tgz)

Topics and Videos Readings
Introduction
Binary Classification Eisenstein 2.0-2.5, 4.2-4.4.1
Perceptron and logistic regression
Sentiment Analysis and Basic Feature Extraction Eisenstein 4.1
Basics of Learning, Gradient Descent
Perceptron
Perceptron as Minimizing Loss
Logistic Regression Perceptron and LR connections
Sentiment Analysis Thumbs up? Sentiment Classification using Machine Learning Techniques Pang et al. 2002

Baselines and Bigrams: Simple, Good Sentiment and Topic Classification Wang and Manning 2012

Convolutional Neural Networks for Sentence Classification Kim 2014

[Github] NLP Progress on Sentiment Analysis
Optimization Basics
Multiclass Classification Eisenstein 4.2
Multiclass lecture note
Multiclass Perceptron and Logistic Regression
Multiclass Classification Examples A large annotated corpus for learning natural language inference Bowman et al. 2015

Authorship Attribution of Micro-Messages Schwartz et al. 2013
Fairness in Classification 50 Years of Test (Un)fairness: Lessons for Machine Learning Hutchinson and Mitchell 2018

Amazon scraps secret AI recruiting tool that showed bias against women
Neural Networks
Neural Network Visualization Neural Networks, Manifolds, and Topology
Feedforward Neural Networks, Backpropagation Eisenstein Chapter 3.1-3.3
Neural Net Implementation
Neural Net Training, Optimization Dropout: a simple way to prevent neural networks from overfitting Srivastava et al. 2014

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Ioffe and Szegedy 2015

Adam: A Method for Stochastic Optimization Kingma and Ba 2015

The Marginal Value of Adaptive Gradient Methods in Machine Learning Wilson et al. 2017
Word Embeddings
Skip-gram Distributed Representations of Words and Phrases and their Compositionality Mikolov et al. 2013
Other Word Embedding Methods A Scalable Hierarchical Distributed Language Model Mnih and Hinton 2008

Neural Word Embedding as Implicit Matrix Factorization Levy and Goldberg 2014

GloVe: Global Vectors for Word Representation Pennington et al. 2014

Enriching Word Vectors with Subword Information Bojanowski et al. 2016
Bias in Word Embeddings Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings Bolukbasi et al. 2016

Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings Manzini et al. 2019

Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them Gonen and Goldberg 2019
Applying Embeddings, Deep Averaging Networks Deep Unordered Composition Rivals Syntactic Methods for Text Classification Iyyer et al. 2015
Part-of-Speech Tagging
Sequence Labeling, Tagging with Classifiers
Hidden Markov Models
HMMs: Parameter Estimation
HMMs: Viterbi Algorithm
Beam Search
HMMs for POS Tagging TnT - A Statistical Part-of-Speech Tagger Brants 2000

Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger Toutanvoa and Manning 2000

Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? Manning 2011

Natural Language Processing with Small Feed-Forward Networks Botha et al. 2017
Conditional Random Fields
Features for NER
Inference and Learning in CRFs
Forward-backward Algorithm
NER Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling Finkel et al. 2005

Design Challenges and Misconceptions in Named Entity Recognition Ratinov and Roth 2009

Neural Architectures for Named Entity Recognition Lample et al.

Ultra-Fine Entity Typing Choi et al. 2018
Constituency Parsing
Probabilistic Context-Free Grammars
CKY Algorithm
Refining Grammars Accurate Unlexicalized Parsing Klein and Manning 2003
Dependencies Finding Optimal 1-Endpoint-Crossing Trees Pitler et al. 2013
Transition-based Dependency Parsing
State-of-the-art Parsers Max-Margin Parsing Taskar et al. 2004

Less Grammar, More Features Hall et al. 2014

Neural CRF Parsing Durrett and Klein 2015

Constituency Parsing with a Self-Attentive Encoder Kitaev and Klein 2018

Online Large-Margin Training of Dependency Parsers McDonald et al. 2005

Efficient Third-Order Dependency Parsers Koo and Collins 2010

Stanford's Graph-based Neural Dependency Parser at the CoNLL 2017 Shared Task Dozat et al. 2017

A Fast and Accurate Dependency Parser using Neural Networks Chen and Manning 2014

Globally Normalized Transition-Based Neural Networks Andor et al. 2016
N-gram LMs
Smoothing in N-gram LMs
Neural Language Models
Basic RNNs, Elman Networks Understanding LSTM Networks
Gates and LSTMs A Primer on Neural Network Models for Natural Language Processing Goldberg 2015

Understanding LSTM Networks
RNN Applications
RNN Language Modeling
Visualizing LSTMs Visualizing and Understanding Recurrent Networks Karpathy et al. 2016
ELMo Deep Contextualized Word Representations Peters et al. 2018

To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks Peters et al. 2019
Model Theoretic Semantics
Montague Semantics
CCG Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars Zettlemoyer and Collins 2005
Seq2seq models
Seq2seq models: Training and Implementation Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks Bengio et al. 2015
Seq2seq Semantic Parsing Data Recombination for Neural Semantic Parsing Jia and Liang 2016
Attention: Problems with seq2seq models Neural Machine Translation By Jointly Learning To Align And Translate Bahdanau et al. 2015

Addressing the Rare Word Problem in Neural Machine Translation Luong et al. 2015
Attention: Model and Implementation Neural Machine Translation By Jointly Learning To Align And Translate Bahdanau et al. 2015

Effective Approaches to Attention-based Neural Machine Translation Luong et al. 2015
Copying and Pointers Addressing the Rare Word Problem in Neural Machine Translation Luong et al. 2015

Data Recombination for Neural Semantic Parsing Jia and Liang 2016
Word Piece and Byte Pair Encoding Neural Machine Translation of Rare Words with Subword Units Sennrich et al. 2016

Byte Pair Encoding is Suboptimal for Language Model Pretraining Bostrom and Durret 2020
Transformers Attention Is All You Need Vaswani et al. 2017
Machine Translation Intro
MT: Framework and Evaluation
MT: Word alignment
MT: IBM Models HMM-Based Word Alignment in Statistical Translation Vogel et al. 1996
Phrase-based Machine Translation Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models Koehn 2004

Minimum Error Rate Training in Statistical Machine Translation Och 2003
Syntactic Machine Translation What's in a translation rule? Galley et al. 2004
Neural Machine Translation Addressing the Rare Word Problem in Neural Machine Translation Luong et al. 2015

Effective Approaches to Attention-based Neural Machine Translation Luong et al. 2015

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Wu et al. 2016

Revisiting Low-Resource Neural Machine Translation: A Case Study Sennrich and Zhang 2019
BERT: Masked Language Modeling BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Devlin et al. 2019
BERT: Model and Applications BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Devlin et al. 2019

To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks Peters et al. 2019

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Wang et al. 2019

RoBERTa: A Robustly Optimized BERT Pretraining Approach Liu et al. 2019
GPT-2 and GPT-3 Language Models are Unsupervised Multitask Learners Radford et al. 2018

Language Models are Few-Shot Learners Brown et al. 2020
BART and other pre-training BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension Lewis et al. 2019

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Raffel et al. 2020
Reading comprehension intro
Reading comprehension: setup and baselines MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text Richardson et al. 2013

SQuAD: 100,000+ Questions for Machine Comprehension of Text Rajpurkar et al. 2016
Attentive Reader Teaching Machines to Read and Comprehend Hermann et al. 2015

Reading Wikipedia to Answer Open-Domain Questions Chen et al. 2017
Improved Reading Comprehension Bi-directional Attention Flow For Machine Comprehension Seo et al. 2017
BERT for QA RACE: Large-scale ReAding Comprehension Dataset From Examinations Lai et al. 2017
Problems with Reading Comprehension Adversarial Examples for Evaluating Reading Comprehension Systems Jia and Liang 2017
Open-domain QA RACE: Large-scale ReAding Comprehension Dataset From Examinations Lai et al. 2017

Latent Retrieval for Weakly Supervised Open Domain Question Answering Lee et al. 2019

Natural Questions
Multi-hop QA Understanding Dataset Design Choices for Multi-hop Reasoning Chen and Durrett 2019

Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering Asai et al. 202
Explainability in NLP The Mythos of Model Interpretability Lipton 2016

Deep Unordered Composition Rivals Syntactic Methods for Text Classification Iyyer et al. 2015

Analysis Methods in Neural Language Processing: A Survey Belinkov and Glass 2019
Local Explanations: Highlights "Why Should I Trust You?" Explaining the Predictions of Any Classifier Ribeiro et al. 2016

Axiomatic Attribution for Deep Networks Sundararajan et al. 2017
Text Explanations Generating Visual Explanations Hendricks et al. 2016

Explaining Question Answering Models through Text Generation Latcinnik and Berant 2020
Model Probing BERT Rediscovers the Classical NLP Pipeline Tenney et al. 2019

What Do You Learn From Context? Probing For Sentence Structure In Contextualized Word Represenations Tenney et al. 2019
Annotation Artifacts Annotation Artifacts in Natural Language Inference Data Gururangan et al. 2018

Hypothesis Only Baselines in Natural Language Inference Poliak et al. 2018

Did the Model Understand the Question? Mudrakarta et al. 2018

Understanding Dataset Design Choices for Multi-hop Reasoning Chen and Durrett 2019

Swag: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference Zellers et al. 2018
Summarization Intro
Extractive Summarization The use of MMR, diversity-based reranking for reordering documents and producing summaries Carbonell and Goldstein 1998

LexRank: Graph-based Lexical Centrality as Salience in Text Summarization Erkan and Radev 2004

A Scalable Global Model for Summarization Gillick and Favre 2009

Revisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization Ghalandari 2017
Neural Extractive Models Fine-tune BERT for Extractive Summarization Liu 2019
Compressive Summarization Jointly Learning to Extract and Compress Berg-Kirkpatrick et al. 2011

Learning-Based Single-Document Summarization with Compression and Anaphoricity Constraints Durrett et al. 2016

Neural Extractive Text Summarization with Syntactic Compression Xu and Durrett 2019
Abstractive Summarization Abstractive Sentence Summarization with Attentive Recurrent Neural Networks Chopra et al. 2016

Get To The Point: Summarization with Pointer-Generator Networks See et al. 2017
Pre-trained Summarization and Factuality BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension Lewis et al. 2019

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization Zhang et al. 2020

Evaluating Factuality in Generation with Dependency-level Entailment Goyal and Durrett 2020
Dialogue: Chatbots
Neural Chatbots A Neural Network Approach to Context-Sensitive Generation of Conversational Responses Sordoni et al. 2015

A Diversity-Promoting Objective Function for Neural Conversation Models Li et al. 2016

Personalizing Dialogue Agents: I have a dog, do you have pets too? Zhang et al. 2018
Task-Oriented Dialogue Wizards of Wikipedia: Knowledge-Powered Conversational Agents Dinan et al. 2019
Dialogue and QA QuAC : Question Answering in Context Choi et al. 2018

Interpretation of Natural Language Rules in Conversational Machine Reading Saeidi et al. 2018
Morphology
Morphological Analysis Supervised Learning of Complete Morphological Paradigms Durrett and DeNero 2013

Translating into Morphologically Rich Languages with Synthetic Phrases Chahuneau et al. 2013
Cross-lingual Tagging and Parsing Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections Das and Petrov 2011

Multi-Source Transfer of Delexicalized Dependency Parsers McDonald et al. 2011
Cross-lingual Pre-training Massively Multilingual Word Embeddings Ammar et al. 2016

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond Artetxe and Schwenk 2019

How multilingual is Multilingual BERT? Pires et al. 2019
Ethical Issues in NLP