CS378: Natural Language Processing (Spring 2020)

NOTE: This page is for an old semester of this class

Instructor: Greg Durrett, gdurrett@cs.utexas.edu
Lecture: Tuesday and Thursday 9:30am - 11:00am, WAG 214
Instructor Office Hours: Tuesdays 1pm-2pm, Wednesdays 10am-11am GDC 3.812
TA: Yasumasa Onoe (yasumasa@utexas.edu), Proctor: Shrey Desai (shreydesai@utexas.edu)
TA Office Hours: All in GDC 1.302:

Monday 11am-12pm (Shrey, TA Desk 1)
Tuesday 2pm-3pm (Yasumasa, TA Desk 1)
Wednesday 3pm-4pm (Shrey, TA Desk 1)
Thursday 1pm-2pm (Yasumasa, TA Desk 1)

Piazza

Note that this is an old version of this course (Spring 2020 edition)

Description

This course provides an introduction to modern natural language processing using machine learning and deep learning approaches. Content includes linguistics fundamentals (syntax, semantics, distributional properties of language), machine learning models (classifiers, sequence taggers, deep learning models), key algorithms for inference, and applications to a range of problems. Students will get hands-on experience building systems to do tasks including text classification, syntactic analysis, language modeling, and language generation.

Requirements

CS 429
Recommended: CS 331, familiarity with probability and linear algebra, programming experience in Python
Helpful: Exposure to AI and machine learning (e.g., CS 342/343/363)

Syllabus

Updated Syllabus (post COVID-19)

~~Detailed syllabus with course policies~~

Assignments:

Assignment 0: Warmup [nyt dataset] [tokenizer.py]

Assignment 1: Sentiment Classification [code and dataset download]

Assignment 2: Feedforward Neural Networks and Optimization [code and dataset download]

Assignment 3: Sequence Modeling and Parsing [code and dataset on Canvas]

Midterm Topics [last year's midterm / solutions, in-class review / solutions]

Assignment 4: Character Language Modeling with RNNs [code and dataset download]

Assignment 5: Machine Translation [code and dataset download]

Final Project: Independent Project (propose by March 31) or Question Answering [Github repo]

Readings: Textbook readings are assigned to complement the material discussed in lecture. You may find it useful to do these readings before lecture as preparation or after lecture to review. Paper readings are intended to supplement the course material if you are interested in diving deeper on particular topics.

The chief text in this course is Eisenstein: Natural Language Processing, available as a free PDF online. For deep learning techniques, this text will be supplemented with selections from Goldberg: A Primer on Neural Network Models for Natural Language Processing. (Another generally useful NLP book is Jurafsky and Martin: Speech and Language Processing (3rd ed. draft), with many draft chapters available for free online; however, we will not be using it much for this course.)

Readings for future lectures are tentative and subject to change.

Date Topics Readings Assignments

Jan 21 Introduction [4pp] A0 out

Jan 23 Classification 1: Features, Perceptron Eisenstein 2.0 (= intro to ch 2), 2.1, 2.3.1, 4.1, 4.3, perc_lecture_plot.py Jan 24: A0 due / A1 out

Jan 28 Classification 2: Logistic Regression, Sentiment Analysis [4pp] Classification lecture note, Jurafsky and Martin 5.0-5.3, Pang+ Thumbs Up, Wang+Manning

Jan 30 Classification 3: Optimization, Multiclass, Examples [4pp] Multiclass lecture note, Eisenstein 2.4.1, 2.5, 2.6, 4.2, Stanford Sentiment, Schwartz+ Authorship

Feb 4 Neural 1: Feedforward, Backpropagation [4pp] Eisenstein 3.0-3.3, Goldberg 4

Feb 6 Neural 2: Implementation, Word embeddings intro [4pp] Eisenstein 3.3, Goldberg 3, 6, ffnn_example.py, DANs, Init and backprop A1 due / A2 out

Feb 11 Guest Lecture: Aishwarya Padmakumar

Feb 13 Neural 3: Word embeddings, Evaluation [4pp] Eisenstein 14.5-14.6, Goldberg 5, word2vec, GloVe, Bias

Feb 18 Sequence 1: Tagging, POS, HMMs Eisenstein 7.1-7.4, 8.1

Feb 20 Sequence 2: HMMs, Viterbi Eisenstein 7.3-7.4, Viterbi lecture note A2 due / A3 out

Feb 25 Sequence 3: Beam Search, POS, CRFs/NER [4pp] Eisenstein 7.5-7.6, Manning POS, POS with FFNNs

Feb 27 Trees 1: Grammar [4pp], PCFGs, CKY Eisenstein 10.1-3, 10.4.1

Mar 3 Trees 2: Better grammars [4pp], Dependency 1 Eisenstein 10.5, 11.1, Unlexicalized parsing

Mar 5 Trees 3: Shift-Reduce Parsing, State-of-the-art Parsers [4pp] Eisenstein 11.3-4, Chen and Manning, Parsey

Mar 10 LM 1: Ngrams / Midterm review Eisenstein 6.1-6.2 A3 due Monday, Mar 9

Mar 12 MIDTERM (in-class)

Mar 17 NO CLASS (SPRING BREAK)

Mar 19 NO CLASS (SPRING BREAK)

Mar 24 NO CLASS (EXTRA SPRING BREAK)

Mar 26 Test lecture (EXTRA SPRING BREAK)

Mar 31 LM 2: RNNs Eisenstein 6.3-6.5, Olah Understanding LSTMs, LSTM PyTorch documentation, lstm_lecture.py A4 out / Custom FP proposals due

April 2 LM 3: Impl (slides 1pp / 4pp) Karpathy Visualizing RNNs, Linzen Assessing LSTMs, RNNs with PyTorch

April 7 MT 1: Phrase-based MT, Alignment Eisenstein 18.1-18.2, 18.4, Michael Collins IBM Models 1+2, JHU slides, History of MT

April 9 MT 2: Phrase-based Decoding (slides 1pp / 4pp) Eisenstein 18.3

April 14 MT 3: Seq2seq, attention Eisenstein 18.3, Attention A4 due / A5 out

April 16 MT 4: systems / QA 1: Semantic representations, semantic parsing (slides 1pp / 4pp) Eisenstein 12, Freebase QA, Zettlemoyer, Jia

April 21 QA 2: Reading comprehension Eisenstein 17.5, Stanford Attentive Reader, SQuAD, BiDAF, DrQA, QA span visualization A5 due / FP out

April 23 QA 3 / Transfer 1: ELMo (slides 1pp / 4pp) ELMo

April 28 Transfer 2: Transformers / BERT (slides 1pp / 4pp) BERT, GPT, Transformers, Illustrated Transformer

April 30 Pre-training Applications [4pp] Eisenstein 19.3, Diversity, PersonaChat, Alexa Team Gunrock

May 5 Multilingual and Cross-Lingual NLP [4pp]

May 7 Wrapup + Ethics [4pp] Final project due May 13

Date	Topics	Readings	Assignments
Jan 21	Introduction [4pp]		A0 out
Jan 23	Classification 1: Features, Perceptron	Eisenstein 2.0 (= intro to ch 2), 2.1, 2.3.1, 4.1, 4.3, perc_lecture_plot.py	Jan 24: A0 due / A1 out
Jan 28	Classification 2: Logistic Regression, Sentiment Analysis [4pp]	Classification lecture note, Jurafsky and Martin 5.0-5.3, Pang+ Thumbs Up, Wang+Manning
Jan 30	Classification 3: Optimization, Multiclass, Examples [4pp]	Multiclass lecture note, Eisenstein 2.4.1, 2.5, 2.6, 4.2, Stanford Sentiment, Schwartz+ Authorship
Feb 4	Neural 1: Feedforward, Backpropagation [4pp]	Eisenstein 3.0-3.3, Goldberg 4
Feb 6	Neural 2: Implementation, Word embeddings intro [4pp]	Eisenstein 3.3, Goldberg 3, 6, ffnn_example.py, DANs, Init and backprop	A1 due / A2 out
Feb 11	Guest Lecture: Aishwarya Padmakumar
Feb 13	Neural 3: Word embeddings, Evaluation [4pp]	Eisenstein 14.5-14.6, Goldberg 5, word2vec, GloVe, Bias
Feb 18	Sequence 1: Tagging, POS, HMMs	Eisenstein 7.1-7.4, 8.1
Feb 20	Sequence 2: HMMs, Viterbi	Eisenstein 7.3-7.4, Viterbi lecture note	A2 due / A3 out
Feb 25	Sequence 3: Beam Search, POS, CRFs/NER [4pp]	Eisenstein 7.5-7.6, Manning POS, POS with FFNNs
Feb 27	Trees 1: Grammar [4pp], PCFGs, CKY	Eisenstein 10.1-3, 10.4.1
Mar 3	Trees 2: Better grammars [4pp], Dependency 1	Eisenstein 10.5, 11.1, Unlexicalized parsing
Mar 5	Trees 3: Shift-Reduce Parsing, State-of-the-art Parsers [4pp]	Eisenstein 11.3-4, Chen and Manning, Parsey
Mar 10	LM 1: Ngrams / Midterm review	Eisenstein 6.1-6.2	A3 due Monday, Mar 9
Mar 12	MIDTERM (in-class)
Mar 17	NO CLASS (SPRING BREAK)
Mar 19	NO CLASS (SPRING BREAK)
Mar 24	NO CLASS (EXTRA SPRING BREAK)
Mar 26	Test lecture (EXTRA SPRING BREAK)
Mar 31	LM 2: RNNs	Eisenstein 6.3-6.5, Olah Understanding LSTMs, LSTM PyTorch documentation, lstm_lecture.py	A4 out / Custom FP proposals due
April 2	LM 3: Impl (slides 1pp / 4pp)	Karpathy Visualizing RNNs, Linzen Assessing LSTMs, RNNs with PyTorch
April 7	MT 1: Phrase-based MT, Alignment	Eisenstein 18.1-18.2, 18.4, Michael Collins IBM Models 1+2, JHU slides, History of MT
April 9	MT 2: Phrase-based Decoding (slides 1pp / 4pp)	Eisenstein 18.3
April 14	MT 3: Seq2seq, attention	Eisenstein 18.3, Attention	A4 due / A5 out
April 16	MT 4: systems / QA 1: Semantic representations, semantic parsing (slides 1pp / 4pp)	Eisenstein 12, Freebase QA, Zettlemoyer, Jia
April 21	QA 2: Reading comprehension	Eisenstein 17.5, Stanford Attentive Reader, SQuAD, BiDAF, DrQA, QA span visualization	A5 due / FP out
April 23	QA 3 / Transfer 1: ELMo (slides 1pp / 4pp)	ELMo
April 28	Transfer 2: Transformers / BERT (slides 1pp / 4pp)	BERT, GPT, Transformers, Illustrated Transformer
April 30	Pre-training Applications [4pp]	Eisenstein 19.3, Diversity, PersonaChat, Alexa Team Gunrock
May 5	Multilingual and Cross-Lingual NLP [4pp]
May 7	Wrapup + Ethics [4pp]		Final project due May 13