CS378: Natural Language Processing (Spring 2020)

Instructor: Greg Durrett, gdurrett@cs.utexas.edu
Lecture: Tuesday and Thursday 9:30am - 11:00am, WAG 214
Instructor Office Hours: Tuesdays 1pm-2pm, Wednesdays 10am-11am GDC 3.812
TA: Yasumasa Onoe (yasumasa@utexas.edu), Proctor: Shrey Desai (shreydesai@utexas.edu)
TA Office Hours: All in GDC 1.302:

Piazza Piazza

Note that this is an old version of this course (Spring 2020 edition)

Description

This course provides an introduction to modern natural language processing using machine learning and deep learning approaches. Content includes linguistics fundamentals (syntax, semantics, distributional properties of language), machine learning models (classifiers, sequence taggers, deep learning models), key algorithms for inference, and applications to a range of problems. Students will get hands-on experience building systems to do tasks including text classification, syntactic analysis, language modeling, and language generation.

Requirements

Syllabus

Updated Syllabus (post COVID-19)

Detailed syllabus with course policies

Assignments:

Assignment 0: Warmup [nyt dataset] [tokenizer.py]

Assignment 1: Sentiment Classification [code and dataset download]

Assignment 2: Feedforward Neural Networks and Optimization [code and dataset download]

Assignment 3: Sequence Modeling and Parsing [code and dataset on Canvas]

Midterm Topics [last year's midterm / solutions, in-class review / solutions]

Assignment 4: Character Language Modeling with RNNs [code and dataset download]

Assignment 5: Machine Translation [code and dataset download]

Final Project: Independent Project (propose by March 31) or Question Answering [Github repo]

Readings: Textbook readings are assigned to complement the material discussed in lecture. You may find it useful to do these readings before lecture as preparation or after lecture to review. Paper readings are intended to supplement the course material if you are interested in diving deeper on particular topics.

The chief text in this course is Eisenstein: Natural Language Processing, available as a free PDF online. For deep learning techniques, this text will be supplemented with selections from Goldberg: A Primer on Neural Network Models for Natural Language Processing. (Another generally useful NLP book is Jurafsky and Martin: Speech and Language Processing (3rd ed. draft), with many draft chapters available for free online; however, we will not be using it much for this course.)

Readings for future lectures are tentative and subject to change.

Date Topics Readings Assignments
Jan 21 Introduction [4pp] A0 out
Jan 23 Classification 1: Features, Perceptron Eisenstein 2.0 (= intro to ch 2), 2.1, 2.3.1, 4.1, 4.3, perc_lecture_plot.py Jan 24: A0 due / A1 out
Jan 28 Classification 2: Logistic Regression, Sentiment Analysis [4pp] Classification lecture note, Jurafsky and Martin 5.0-5.3, Pang+ Thumbs Up, Wang+Manning
Jan 30 Classification 3: Optimization, Multiclass, Examples [4pp] Multiclass lecture note, Eisenstein 2.4.1, 2.5, 2.6, 4.2, Stanford Sentiment, Schwartz+ Authorship
Feb 4 Neural 1: Feedforward, Backpropagation [4pp] Eisenstein 3.0-3.3, Goldberg 4
Feb 6 Neural 2: Implementation, Word embeddings intro [4pp] Eisenstein 3.3, Goldberg 3, 6, ffnn_example.py, DANs, Init and backprop A1 due / A2 out
Feb 11 Guest Lecture: Aishwarya Padmakumar
Feb 13 Neural 3: Word embeddings, Evaluation [4pp] Eisenstein 14.5-14.6, Goldberg 5, word2vec, GloVe, Bias
Feb 18 Sequence 1: Tagging, POS, HMMs Eisenstein 7.1-7.4, 8.1
Feb 20 Sequence 2: HMMs, Viterbi Eisenstein 7.3-7.4, Viterbi lecture note A2 due / A3 out
Feb 25 Sequence 3: Beam Search, POS, CRFs/NER [4pp] Eisenstein 7.5-7.6, Manning POS, POS with FFNNs
Feb 27 Trees 1: Grammar [4pp], PCFGs, CKY Eisenstein 10.1-3, 10.4.1
Mar 3 Trees 2: Better grammars [4pp], Dependency 1 Eisenstein 10.5, 11.1, Unlexicalized parsing
Mar 5 Trees 3: Shift-Reduce Parsing, State-of-the-art Parsers [4pp] Eisenstein 11.3-4, Chen and Manning, Parsey
Mar 10 LM 1: Ngrams / Midterm review Eisenstein 6.1-6.2 A3 due Monday, Mar 9
Mar 12 MIDTERM (in-class)
Mar 17 NO CLASS (SPRING BREAK)
Mar 19 NO CLASS (SPRING BREAK)
Mar 24 NO CLASS (EXTRA SPRING BREAK)
Mar 26 Test lecture (EXTRA SPRING BREAK)
Mar 31 LM 2: RNNs Eisenstein 6.3-6.5, Olah Understanding LSTMs, LSTM PyTorch documentation, lstm_lecture.py A4 out / Custom FP proposals due
April 2 LM 3: Impl (slides 1pp / 4pp) Karpathy Visualizing RNNs, Linzen Assessing LSTMs, RNNs with PyTorch
April 7 MT 1: Phrase-based MT, Alignment Eisenstein 18.1-18.2, 18.4, Michael Collins IBM Models 1+2, JHU slides, History of MT
April 9 MT 2: Phrase-based Decoding (slides 1pp / 4pp) Eisenstein 18.3
April 14 MT 3: Seq2seq, attention Eisenstein 18.3, Attention A4 due / A5 out
April 16 MT 4: systems / QA 1: Semantic representations, semantic parsing (slides 1pp / 4pp) Eisenstein 12, Freebase QA, Zettlemoyer, Jia
April 21 QA 2: Reading comprehension Eisenstein 17.5, Stanford Attentive Reader, SQuAD, BiDAF, DrQA, QA span visualization A5 due / FP out
April 23 QA 3 / Transfer 1: ELMo (slides 1pp / 4pp) ELMo
April 28 Transfer 2: Transformers / BERT (slides 1pp / 4pp) BERT, GPT, Transformers, Illustrated Transformer
April 30 Pre-training Applications [4pp] Eisenstein 19.3, Diversity, PersonaChat, Alexa Team Gunrock
May 5 Multilingual and Cross-Lingual NLP [4pp]
May 7 Wrapup + Ethics [4pp] Final project due May 13