CS395T: Structured Models for NLP

Instructor: Greg Durrett, gdurrett@cs.utexas.edu
Lecture: Tuesday and Thursday 9:30am - 11:00am, Garrison Hall 0.132 (GAR)
Instructor Office Hours: Wednesday 10:00am - 12:00pm, GDC 3.420 (additional OHs by appointment)
TA: Ye Zhang
TA Office Hours: Tuesday and Thursday 2pm-3pm, GDC 1.302

Piazza

Description

This class covers a range of topics in structured prediction and deep learning with a focus on applications to NLP. We discuss model structures that commonly arise in NLP such as sequence models, tree-structured models, more general graphical models, recurrent neural networks, convolutional neural networks, and connections between these. We study the models themselves, examples of problems they are applied to, inference methods, parameter estimation (both supervised and unsupervised approaches), and optimization. Programming assignments involve building scalable machine learning systems for various NLP tasks, with a focus on understanding design decisions surrounding modeling, inference, and learning, and how these interact.

Differences from CS388: This class is intended to complement CS388; CS388 is not required as a prerequisite for this class, nor will those who have taken CS388 have seen everything in this class. In particular, this class has a greater emphasis on the fundamentals of structured machine learning and covers a wider range of deep learning techniques, while CS388 deals more with covering broadly important problems in NLP and studying the underlying linguistic phenomena.

Requirements

391L - Machine Learning (or equivalent)
311 or 311H - Discrete Math for Computer Science (or equivalent)
Familiarity with Python (for programming assignments)
Additional prior exposure to probability, linear algebra, optimization, linguistics, and NLP useful but not required

Syllabus

Detailed syllabus with course policies

This course is broken into two halves: the first half covers structured prediction techniques with linear models, and the second revisits these techniques and structures in the context of deep neural networks. Throughout the course, methods will be illustrated via a number of NLP tasks including POS tagging, named entity recognition, syntactic parsing, sentiment analysis, machine translation, image captioning, and others. This schedule is tentative! Because this is the first time this course is being offered, lecture topics at the end may shift around.

Assignments: There are three programming assignments that require implementing models discussed in class. Framework code in Python and datasets will be provided. If you prefer to use another language, that is possible as well, but you'll have to implement some basic file I/O and other parts of the framework code yourself. In addition, there is an open-ended final project to be done either individually or in teams of 2. This project should constitute novel exploration beyond directly implementing concepts from lecture and should result in a report that roughly reads like an NLP/ML conference submission in terms of presentation and scope.

Samples of successful Project 1 reports: Sample 1 Sample 2

Project 1: CRF for NER [download code]

Project 2: Shift-Reduce Parsing [download code]

Project 3: Neural Networks for Sentiment Analysis [download code and data (20MB)]

Final Project

Readings: Readings are purely optional and intended to supplement lecture and give you another view of the material. Two main sources will be used:

Jurafsky and Martin: Speech and Language Processing (3rd ed. draft). This is a draft textbook so chapters are available online in PDF. Some chapters are available in the print 2nd edition (purple) which you can purchase as a good overall reference (but you may want to wait for the 3rd edition). Don't buy the 1st edition (white).
Goldberg: A Primer on Neural Network Models for Natural Language Processing

Date	Topics	Readings	Assignments
Aug 31	Introduction [1pp] [4pp]
Sept 5	Binary classification [4pp]	JM 6.1-6.3
Sept 7	Multiclass classification [4pp]	JM 7, Structured SVM secs 1-2
Sept 12	Sequence models I: HMMs [4pp]	JM 9, JM 10.4, Manning POS	P1 out
Sept 14	Sequence models II: CRFs [4pp]	Sutton CRFs 2.3, 2.6.1, Wallach CRFs tutorial, Illinois NER
Sept 19	Sequence models III: Unsupervised [4pp]	JM 9.5, Painless
Sept 21	Tree models I: Constituency [4pp]	JM 13.1-13.7, Structural, Lexicalized, State-split
Sept 26	Tree models II: Constituency II / Dependency I [4pp] Tips for Academic Writing [4pp]
Sept 28	Tree models III: Dependency II [4pp]	JM 14.1-14.4, Huang 1-2	P1 due / P2 out
Oct 3	Tree models IV: Global Dependency Parsing [4pp]	Parsey, Huang 2
Oct 5	"Loopy" graphical models [4pp]	Skip-chain NER, Joint entity
Oct 10	Machine translation [4pp]	HMM alignment, Pharaoh
Oct 12	Feedforward neural networks [4pp]	Goldberg 1-4, 6, NLP with FFNNs, DANs
Oct 17	NN implementation, word reprs. [4pp]	Goldberg 5, word2vec, GloVe, Dropout	P2 due / P3 out
Oct 19	RNNs I: Encoders [4pp]	Goldberg 10-11, SNLI, Visualizing
Oct 24	RNNs II: Decoders [4pp]	Seq2seq, Attention, Luong Attention
Oct 26	CNNs [4pp]	Goldberg 9, Kim, ByteNet
Oct 31	Special guest lecture: Ye Zhang
Nov 2	Advanced NNs I: Neural CRFs [4pp]	Collobert and Weston, Neural NER, Neural CRF parsing	P3 due / FP out
Nov 7	Advanced NNs II: QA/memory networks [4pp]	E2E Memory Networks, CBT, SQuAD, BiDAF
Nov 9	Deep generative models/VAE [4pp]	Bowman VAE, Miao VAE	Proposals due
Nov 14	Summarization [4pp]	MMR, Gillick, Sentence compression, SummaRuNNER, Pointer
Nov 16	Special guest lecture: Katrin Erk
Nov 21	Dialogue systems [4pp]	RNN chatbots, Diversity, Goal-oriented, Latent Intention, QA-as-dialogue
Nov 23	NO CLASS (Thanksgiving)
Nov 28	Information extraction [4pp]	Distant supervision, RL for slot filling, TextRunner, ReVerb, NELL
Nov 30	Wrapup [4pp]
Dec 5	Project presentations I
Dec 7	Project presentations II
Dec 15			FP due