CS378: Natural Language Processing

Instructor: Greg Durrett, gdurrett@cs.utexas.edu
Lecture: Tuesday and Thursday 9:30am - 11:00am, GDC 5.302
Instructor Office Hours: Tuesday 1pm-2pm / Wednesday 11am-12pm GDC 3.420
TAs: Jiacheng Xu (jcxu@cs.utexas) Shivangi Mahto (shivangi@cs.utexas)
TA Office Hours: Monday/Wednesday 1pm-2pm (Jiacheng) GDC 1.302 Desk 1/2, Thursday 2pm-3pm (Shivangi) GDC 1.302 Desk 3

Piazza Piazza

Description

This course provides an introduction to modern natural language processing using machine learning and deep learning approaches. Content includes linguistics fundamentals (syntax, semantics, distributional properties of language), machine learning models (classifiers, sequence taggers, deep learning models), key algorithms for inference, and applications to a range of problems. Students will get hands-on experience building systems to do tasks including text classification, syntactic analysis, language modeling, and language generation.

Requirements

Syllabus

Detailed syllabus with course policies

Assignments:

Assignment 0: Warmup [nyt dataset]

Assignment 1: Sentiment Classification [code and dataset download]

Assignment 2: Feedforward Neural Networks [code and dataset download]

Assignment 3: Sequence Modeling and Parsing [code and dataset on Canvas]

Midterm: topics and practice questions

Assignment 4: Character Language Modeling with RNNs [code and dataset download]

Final Project

Readings: Textbook readings are assigned to complement the material discussed in lecture. You may find it useful to do these readings before lecture as preparation or after lecture to review. Paper readings are intended to supplement the course material if you are interested in diving deeper on particular topics.

The chief text in this course is Eisenstein: Natural Language Processing, available as a free PDF online. For deep learning techniques, this text will be supplemented with selections from Goldberg: A Primer on Neural Network Models for Natural Language Processing. (Another generally useful NLP book is Jurafsky and Martin: Speech and Language Processing (3rd ed. draft), with many draft chapters available for free online; however, we will not be using it for this course.)

Readings for future lectures are tentative and subject to change.

Date Topics Readings Assignments
Jan 22 Introduction [4pp] A0 out
Jan 24 Classification I: Features, Naive Bayes Eisenstein 2.0 (= intro to ch 2), 2.1, 4.1, 4.3 A0 due Friday / A1 out
Jan 29 Classification II: Perceptron, Logistic Regression Eisenstein 2.2, 2.4, Pang+ Thumbs Up, Wang+Manning
Jan 31 Classification III: Multiclass Eisenstein 2.4.1, 2.5, 4.2, Schwartz+ Authorship
Feb 5 Neural I: Feedforward [4pp] Eisenstein 3.0-3.3, Goldberg 3-4, ffnn_example.py
Feb 7 Neural II: Implementation, Word embeddings [4pp] Eisenstein 3.3, Goldberg 6, ffnn_example.py A1 due / A2 out
Feb 12 Neural III: Word embeddings, NNs for NLP [4pp] Eisenstein 14.5-14.6, Goldberg 5, word2vec, GloVe, NLP with FFNNs, DANs
Feb 14 Sequence I: Tagging, POS, HMMs Eisenstein 7.1-7.4, 8.1, Manning POS
Feb 19 Sequence II: HMMs, Viterbi, Beam Search Eisenstein 7.3-7.4, Viterbi lecture note
Feb 21 Sequence III: CRFs, NER / Trees I: Grammar [4pp] Eisenstein 7.5-7.6, 10.1-10.2 A2 due / A3 out
Feb 26 Trees II: PCFGs, CKY Eisenstein 10.3, 10.4.1
Feb 28 Trees III: Better grammars, Dependency I [4pp] Eisenstein 10.5, 11.1, Unlexicalized parsing
Mar 5 Trees IV: Dependency II Eisenstein 11.3-4
Mar 7 Information Extraction [4pp]
Mar 12 LM I: Ngrams / Midterm review Eisenstein 6.1-6.2 A3 due Monday 3/11
Mar 14 MIDTERM (in-class)
Mar 19 NO CLASS (SPRING BREAK)
Mar 21 NO CLASS (SPRING BREAK)
Mar 26 LM II: LSTMs Eisenstein 6.3-6.5, Olah Understanding LSTMs A4 out
Mar 28 LM III: Impl / MT I: Intro [4pp] Eisenstein 18.1-18.2, Karpathy Visualizing RNNs
April 2 MT II: Phrase-based Eisenstein 18.2, 18.4, Pharaoh
April 4 MT III: Decoding, Seq2seq [4pp] Eisenstein 18.3
April 9 MT IV: Seq2seq (cont'd), attention [4pp] Eisenstein 18.3, Attention A4 due / FP out
April 11 DIAL I: Chatbots [4pp] Eisenstein 19.3.3, Diversity, PersonaChat, Alexa Team Gunrock FP proposal due Friday 4/12
April 16 DIAL II: Task-oriented [4pp] Eisenstein 19.3.1-2
April 18 Neural IV: Transfer Learning [4pp] ELMo
April 23 Neural V: Transformers [4pp] BERT, GPT, Transformers, Illustrated Transformer
April 25 QA I: Semantic representations Eisenstein 12, Freebase QA
April 30 QA II: Semantic parsing [4pp] Zettlemoyer, Jia
May 2 QA III: Reading comprehension Eisenstein 17.5, Stanford Attentive Reader, SQuAD, BiDAF
May 7 Multilingual Methods [4pp]
May 9 Wrapup + Ethics [4pp]