CS378: Natural Language Processing

Instructor: Greg Durrett, gdurrett@cs.utexas.edu
Lecture: Tuesday and Thursday 9:30am - 11:00am, GDC 5.302
Instructor Office Hours: Tuesday 1pm-2pm / Wednesday 11am-12pm GDC 3.420
TAs: Jiacheng Xu (jcxu@cs.utexas) Shivangi Mahto (shivangi@cs.utexas)
TA Office Hours: Monday/Wednesday 1pm-2pm (Jiacheng) GDC 1.302 Desk 1/2, Thursday 2pm-3pm (Shivangi) GDC 1.302 Desk 3

Piazza Piazza

Description

This course provides an introduction to modern natural language processing using machine learning and deep learning approaches. Content includes linguistics fundamentals (syntax, semantics, distributional properties of language), machine learning models (classifiers, sequence taggers, deep learning models), key algorithms for inference, and applications to a range of problems. Students will get hands-on experience building systems to do tasks including text classification, syntactic analysis, language modeling, and language generation.

Requirements

Syllabus

Detailed syllabus with course policies

Assignments:

Assignment 0: Warmup [nyt dataset]

Assignment 1: Sentiment Classification [code and dataset download]

Assignment 2: Feedforward Neural Networks [code and dataset download]

Readings: Textbook readings are assigned to complement the material discussed in lecture. You may find it useful to do these readings before lecture as preparation or after lecture to review. Paper readings are intended to supplement the course material if you are interested in diving deeper on particular topics.

The chief text in this course is Eisenstein: Natural Language Processing, available as a free PDF online. For deep learning techniques, this text will be supplemented with selections from Goldberg: A Primer on Neural Network Models for Natural Language Processing. (Another generally useful NLP book is Jurafsky and Martin: Speech and Language Processing (3rd ed. draft), with many draft chapters available for free online; however, we will not be using it for this course.)

Readings for future lectures are tentative and subject to change.

Date Topics Readings Assignments
Jan 22 Introduction [4pp] A0 out
Jan 24 Classification I: Features, Naive Bayes Eisenstein 2.0 (= intro to ch 2), 2.1, 4.1, 4.3 A0 due Friday / A1 out
Jan 29 Classification II: Perceptron, Logistic Regression Eisenstein 2.2, 2.4, Pang+ Thumbs Up, Wang+Manning
Jan 31 Classification III: Multiclass Eisenstein 2.4.1, 2.5, 4.2, Schwartz+ Authorship
Feb 5 Neural I: Feedforward [4pp] Eisenstein 3.0-3.3, Goldberg 3-4, ffnn_example.py
Feb 7 Neural II: Implementation, Word embeddings [4pp] Eisenstein 3.3, Goldberg 6, ffnn_example.py A1 due / A2 out
Feb 12 Neural III: Word embeddings, NNs for NLP [4pp] Eisenstein 14.5-14.6, Goldberg 5, word2vec, GloVe, NLP with FFNNs, DANs
Feb 14 Sequence I: Tagging, POS, HMMs Eisenstein 7.1-7.4, 8.1, Manning POS
Feb 19 Sequence II: HMMs, Viterbi, Beam Search Eisenstein 7.3-7.4
Feb 21 Sequence III: CRFs, NER / Trees I: Grammar [4pp] Eisenstein 7.5-7.6, 10.1-10.3 A2 due / A3 out
Feb 26 Trees II: PCFGs, CKY Eisenstein 10.4-10.5
Feb 28 Trees III: Dependency I Eisenstein 11.1, 11.3
Mar 5 Trees IV: Dependency (cont'd)
Mar 7 Information Extraction A3 due
Mar 12 LM I: Ngrams
Mar 14 MIDTERM (in-class)
Mar 19 NO CLASS (SPRING BREAK)
Mar 21 NO CLASS (SPRING BREAK)
Mar 26 LM II: LSTMs A4 out
Mar 28 MT I: Alignment
April 2 MT II: Phrase-based
April 4 MT III: Seq2seq
April 9 MT IV: Seq2seq (cont'd), attention A4 due / FP out
April 11 SPEECH I: Acoustic modeling
April 16 DIAL I: Chatbots
April 18 DIAL II: Task-oriented
April 23 Information Extraction II
April 25 QA I: Semantic Representations
April 30 QA II: Semantic parsing
May 2 QA III: Reading comprehension
May 7 Multilingual Methods
May 9 Wrapup + Ethics