Topics and Videos |
Readings |
00 Introduction
|
|
01 Linear Binary
Classification |
Eisenstein
2.0-2.5, 4.2-4.4.1
Perceptron and logistic regression
|
02 Sentiment Analysis and Basic Feature Extraction |
Eisenstein
4.1 |
03 Basics of
Learning, Gradient Descent |
|
04 Perceptron
|
|
05 Perceptron as
Minimizing Loss
|
|
06 Logistic
Regression
|
Perceptron and LR connections |
07 Sentiment
Analysis
|
Thumbs up? Sentiment Classification using
Machine Learning Techniques Pang et al. 2002
Baselines and Bigrams: Simple, Good
Sentiment and Topic Classification Wang and Manning 2012
Convolutional Neural Networks for Sentence
Classification Kim 2014
[Github] NLP Progress on Sentiment Analysis
|
08 Optimization
Basics
|
|
09 Multiclass
Classification |
Eisenstein
4.2
Multiclass lecture note
|
10 Multiclass
Perceptron and Logistic Regression |
|
11 Multiclass
Classification Examples |
A large annotated corpus for learning
natural language inference Bowman et al. 2015
Authorship Attribution of
Micro-Messages Schwartz et al. 2013
|
11-2 Fairness in
Classification |
50 Years of Test (Un)fairness: Lessons for
Machine Learning Hutchinson and Mitchell 2018
Amazon scraps secret AI recruiting tool that showed bias against women
|
12 Neural
Networks |
|
13 Neural Network
Visualization |
Neural Networks,
Manifolds, and Topology |
14 Feedforward
Neural Networks, Backpropagation |
Eisenstein
Chapter 3.1-3.3 |
15 Neural Net
Implementation |
|
16 Neural Net
Training, Optimization |
Dropout: a simple way to prevent neural networks from
overfitting Srivastava et al. 2014
Batch Normalization: Accelerating Deep Network Training by Reducing
Internal Covariate Shift Ioffe and Szegedy 2015
Adam: A Method for Stochastic Optimization Kingma and Ba
2015
The Marginal
Value of Adaptive Gradient Methods in Machine Learning Wilson et al. 2017
|
17 Word
Embeddings |
|
18 Skip-gram
|
Distributed
Representations of Words and Phrases and their Compositionality Mikolov et al. 2013 |
19 Other Word
Embedding Methods |
A Scalable Hierarchical Distributed Language Model Mnih and Hinton 2008
Neural Word Embedding as Implicit Matrix Factorization Levy and Goldberg 2014
GloVe: Global Vectors for Word
Representation Pennington et al. 2014
Enriching Word Vectors with
Subword Information Bojanowski et al. 2016
|
20 Bias in Word
Embeddings |
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word
Embeddings Bolukbasi et al. 2016
Black is to Criminal as Caucasian is to
Police: Detecting and Removing Multiclass Bias in Word Embeddings Manzini et al. 2019
Lipstick on a Pig: Debiasing Methods Cover
up Systematic Gender Biases in Word Embeddings But do not Remove Them Gonen and Goldberg 2019
|
21 Applying
Embeddings, Deep Averaging Networks |
Deep Unordered Composition Rivals
Syntactic Methods for Text Classification Iyyer et al. 2015 |
22 Part-of-Speech
Tagging |
|
23 Sequence
Labeling, Tagging with Classifiers |
|
24 Hidden Markov
Models |
|
25 HMMs: Parameter Estimation |
|
26 HMMs: Viterbi Algorithm |
|
27 Beam Search |
|
28 HMMs for POS Tagging |
TnT - A Statistical Part-of-Speech Tagger
Brants 2000
Enriching the Knowledge Sources Used in a
Maximum Entropy Part-of-Speech Tagger Toutanvoa and Manning 2000
Part-of-Speech Tagging
from 97% to 100%: Is It Time for Some Linguistics? Manning 2011
Natural Language Processing with Small
Feed-Forward Networks Botha et al. 2017
|
29 Conditional Random Fields |
|
30 Features for NER |
|
31 Inference and Learning in CRFs |
|
32 Forward-backward Algorithm |
|
33 NER |
Incorporating Non-local Information into
Information Extraction Systems by Gibbs Sampling Finkel et al. 2005
Design Challenges and Misconceptions in
Named Entity Recognition Ratinov and Roth 2009
Neural Architectures for Named Entity
Recognition Lample et al.
Ultra-Fine Entity Typing Choi et al.
2018
|
34 Constituency Parsing |
|
35 Probabilistic Context-Free Grammars |
|
36 CKY Algorithm |
|
37 Refining Grammars |
Accurate Unlexicalized Parsing Klein
and Manning 2003 |
38 Dependencies |
Finding Optimal 1-Endpoint-Crossing
Trees Pitler et al. 2013 |
39 Transition-based Dependency Parsing |
|
40 State-of-the-art Parsers |
Max-Margin Parsing Taskar et al.
2004
Less Grammar, More Features Hall et al.
2014
Neural CRF Parsing Durrett and Klein
2015
Constituency Parsing with a Self-Attentive
Encoder Kitaev and Klein 2018
Online Large-Margin Training of Dependency
Parsers McDonald et al. 2005
Efficient Third-Order Dependency
Parsers Koo and Collins 2010
Stanford's Graph-based Neural Dependency
Parser at the CoNLL 2017 Shared Task Dozat et al. 2017
A Fast and Accurate Dependency Parser using
Neural Networks Chen and Manning 2014
Globally Normalized Transition-Based Neural
Networks Andor et al. 2016
|
41 n-gram LMs |
|
42 Smoothing in n-gram LMs |
|
43 Neural Language Models |
|
44 Basic RNNs, Elman Networks |
Understanding LSTM
Networks |
45 Gates and LSTMs |
A Primer on Neural Network Models for Natural
Language Processing Goldberg 2015
Understanding LSTM
Networks
|
46 RNN Applications |
|
47 RNN Language Modeling |
|
48 Visualizing LSTMs |
Visualizing and Understanding Recurrent
Networks Karpathy et al. 2016 |
49 ELMo |
Deep Contextualized Word
Representations Peters et al. 2018
To Tune or Not to Tune? Adapting Pretrained
Representations to Diverse Tasks Peters et al. 2019
|
50 Model Theoretic Semantics |
|
51 Montague Semantics |
|
52 CCG |
Learning to Map Sentences
to Logical Form: Structured Classification with Probabilistic Categorial Grammars Zettlemoyer and Collins
2005 |
53 Seq2seq models |
|
54 Seq2seq models: Training and Implementation |
Scheduled Sampling for Sequence Prediction with
Recurrent Neural Networks Bengio et al. 2015 |
55 Seq2seq Semantic Parsing |
Data Recombination for Neural Semantic
Parsing Jia and Liang 2016 |
56 Attention: Problems with seq2seq models |
Neural Machine Translation By Jointly Learning
To Align And Translate Bahdanau et al. 2015
Addressing the Rare Word Problem in Neural Machine
Translation Luong et al. 2015
|
57 Attention: Model and Implementation |
Neural Machine Translation By Jointly Learning
To Align And Translate Bahdanau et al. 2015
Effective Approaches to Attention-based Neural
Machine Translation Luong et al. 2015
|
58 Copying and Pointers |
Addressing the Rare Word Problem in Neural
Machine Translation Luong et al. 2015
Data Recombination for Neural Semantic
Parsing Jia and Liang 2016
|
59 Word Piece and Byte Pair Encoding |
Neural Machine Translation of Rare Words with
Subword Units Sennrich et al. 2016
Byte Pair Encoding is Suboptimal for Language
Model Pretraining Bostrom and Durret 2020
|
60 Transformers |
Attention Is All You Need Vaswani et al.
2017 |
61 Machine Translation Intro |
|
62 MT: Framework and Evaluation |
|
63 MT: Word alignment |
|
64 MT: IBM Models |
HMM-Based Word Alignment in
Statistical Translation Vogel et al. 1996 |
65 Phrase-based Machine Translation |
Pharaoh: A
Beam Search Decoder for Phrase-Based Statistical Machine Translation Models Koehn 2004
Minimum Error Rate Training in Statistical
Machine Translation Och 2003
|
66 Syntactic Machine Translation |
What's in a translation rule?
Galley et al. 2004 |
67 Neural Machine Translation |
Addressing the Rare Word Problem in
Neural Machine Translation Luong et al. 2015
Effective Approaches to Attention-based
Neural Machine Translation Luong et al. 2015
Google's Neural Machine Translation System:
Bridging the Gap between Human and Machine Translation Wu et al. 2016
Revisiting Low-Resource Neural Machine
Translation: A Case Study Sennrich and Zhang 2019
|
68 BERT: Masked Language Modeling |
BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding Devlin et al. 2019 |
69 BERT: Model and Applications |
BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding Devlin et al. 2019
To Tune or Not to Tune? Adapting Pretrained
Representations to Diverse Tasks Peters et al. 2019
GLUE: A Multi-Task Benchmark and Analysis
Platform for Natural Language Understanding Wang et al. 2019
RoBERTa: A Robustly Optimized BERT Pretraining
Approach Liu et al. 2019
|
70 GPT-2 |
Language Models are Unsupervised Multitask Learners Radford et al. 2018
|
70b GPT-3 |
Language Models are Few-Shot Learners Brown
et al. 2020
|
71 BART and other pre-training |
BART: Denoising Sequence-to-Sequence Pre-training
for Natural Language Generation, Translation, and Comprehension Lewis et al. 2019
Exploring the Limits of Transfer Learning with a
Unified Text-to-Text Transformer Raffel et al. 2020
|
72 Reading comprehension intro |
|
73 Reading comprehension: setup and baselines |
MCTest: A Challenge Dataset for the
Open-Domain Machine Comprehension of Text Richardson et al. 2013
SQuAD: 100,000+ Questions for Machine
Comprehension of Text Rajpurkar et al. 2016
|
74 Attentive Reader |
Teaching Machines to Read and Comprehend
Hermann et al. 2015
Reading Wikipedia to Answer Open-Domain
Questions Chen et al. 2017
|
75 Improved Reading Comprehension |
Bi-directional Attention Flow For Machine
Comprehension Seo et al. 2017 |
76 BERT for QA |
RACE: Large-scale ReAding Comprehension
Dataset From Examinations Lai et al. 2017 |
77 Problems with Reading Comprehension |
Adversarial Examples for Evaluating
Reading Comprehension Systems Jia and Liang 2017 |
78 Open-domain QA |
RACE: Large-scale ReAding Comprehension
Dataset From Examinations Lai et al. 2017
Latent Retrieval for Weakly Supervised
Open Domain Question Answering Lee et al. 2019
Natural Questions
|
79 Multi-hop QA |
Understanding Dataset Design Choices for
Multi-hop Reasoning Chen and Durrett 2019
Learning to Retrieve Reasoning Paths over
Wikipedia Graph for Question Answering Asai et al. 202
|
80 Explainability in NLP |
The Mythos of Model Interpretability Lipton
2016
Deep Unordered Composition Rivals Syntactic
Methods for Text Classification Iyyer et al. 2015
Analysis Methods in Neural Language Processing: A
Survey Belinkov and Glass 2019
|
81 Local Explanations: Highlights |
"Why Should I Trust You?" Explaining the
Predictions of Any Classifier Ribeiro et al. 2016
Axiomatic Attribution for Deep Networks
Sundararajan et al. 2017
|
82 Text Explanations |
Generating Visual Explanations Hendricks et
al. 2016
Explaining Question Answering Models through Text
Generation Latcinnik and Berant 2020
|
83 Model Probing |
BERT Rediscovers the Classical NLP Pipeline
Tenney et al. 2019
What Do You Learn From Context? Probing For
Sentence Structure In Contextualized Word Represenations Tenney et al. 2019
|
84 Annotation Artifacts |
Annotation Artifacts in Natural Language
Inference Data Gururangan et al. 2018
Hypothesis Only Baselines in Natural
Language Inference Poliak et al. 2018
Did the Model Understand the Question?
Mudrakarta et al. 2018
Understanding Dataset Design Choices for
Multi-hop Reasoning Chen and Durrett 2019
Swag: A Large-Scale Adversarial Dataset
for Grounded Commonsense Inference Zellers et al. 2018
|
85 Summarization Intro |
|
86 Extractive Summarization |
The use of MMR, diversity-based
reranking for reordering documents and producing summaries Carbonell and Goldstein 1998
LexRank: Graph-based Lexical
Centrality as Salience in Text Summarization Erkan and Radev 2004
A Scalable Global Model for
Summarization Gillick and Favre 2009
Revisiting the Centroid-based Method: A
Strong Baseline for Multi-Document Summarization Ghalandari 2017
|
87 Neural Extractive Models |
Fine-tune BERT for Extractive Summarization
Liu 2019 |
88 Compressive Summarization |
Jointly Learning to Extract and
Compress Berg-Kirkpatrick et al. 2011
Learning-Based Single-Document
Summarization with Compression and Anaphoricity Constraints Durrett et al. 2016
Neural Extractive Text Summarization with
Syntactic Compression Xu and Durrett 2019
|
89 Abstractive Summarization |
Abstractive Sentence Summarization with
Attentive Recurrent Neural Networks Chopra et al. 2016
Get To The Point: Summarization with
Pointer-Generator Networks See et al. 2017
|
90 Pre-trained Summarization and Factuality |
BART: Denoising
Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension Lewis et
al. 2019
PEGASUS:
Pre-training with Extracted Gap-sentences for Abstractive Summarization Zhang et al. 2020
Evaluating Factuality in Generation with
Dependency-level Entailment Goyal and Durrett 2020
|
91 Dialogue: Chatbots |
|
92 Neural Chatbots |
A Neural Network Approach to
Context-Sensitive Generation of Conversational Responses Sordoni et al. 2015
A Diversity-Promoting Objective Function for
Neural Conversation Models Li et al. 2016
Personalizing Dialogue Agents: I have a dog, do
you have pets too? Zhang et al. 2018
|
93 Task-Oriented Dialogue |
Wizards of Wikipedia: Knowledge-Powered
Conversational Agents Dinan et al. 2019 |
94 Dialogue and QA |
QuAC : Question Answering in Context Choi
et al. 2018
Interpretation of Natural Language Rules in
Conversational Machine Reading Saeidi et al. 2018
|
95 Morphology |
|
96 Morphological Analysis |
Supervised Learning of Complete
Morphological Paradigms Durrett and DeNero 2013
Translating into Morphologically Rich
Languages with Synthetic Phrases Chahuneau et al. 2013
|
97 Cross-lingual Tagging and Parsing |
Unsupervised Part-of-Speech Tagging with
Bilingual Graph-Based Projections Das and Petrov 2011
Multi-Source Transfer of Delexicalized
Dependency Parsers McDonald et al. 2011
|
98 Cross-lingual Pre-training |
Massively Multilingual Word Embeddings
Ammar et al. 2016
Massively Multilingual Sentence
Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond Artetxe and Schwenk 2019
How multilingual is Multilingual
BERT? Pires et al. 2019
|
99 Ethical Issues in NLP |
|