CS378: Natural Language Processing (Spring 2019)

NOTE: This page is for an old semester of this class

Instructor: Greg Durrett, gdurrett@cs.utexas.edu
Lecture: Tuesday and Thursday 9:30am - 11:00am, GDC 5.302
Instructor Office Hours: Tuesday 1pm-2pm / Wednesday 11am-12pm GDC 3.420
TAs: Jiacheng Xu (jcxu@cs.utexas) Shivangi Mahto (shivangi@cs.utexas)
TA Office Hours: Monday/Wednesday 1pm-2pm (Jiacheng) GDC 1.302 Desk 1/2, Thursday 2pm-3pm (Shivangi) GDC 1.302 Desk 3

Piazza

Note that this is an old version of this course (Spring 2019 edition)

Description

This course provides an introduction to modern natural language processing using machine learning and deep learning approaches. Content includes linguistics fundamentals (syntax, semantics, distributional properties of language), machine learning models (classifiers, sequence taggers, deep learning models), key algorithms for inference, and applications to a range of problems. Students will get hands-on experience building systems to do tasks including text classification, syntactic analysis, language modeling, and language generation.

Requirements

CS 429
Recommended: CS 331, familiarity with probability and linear algebra, programming experience in Python
Helpful: Exposure to AI and machine learning (e.g., CS 342/343/363)

Syllabus

Detailed syllabus with course policies

Assignments:

Assignment 0: Warmup [nyt dataset]

Assignment 1: Sentiment Classification [code and dataset download]

Assignment 2: Feedforward Neural Networks [code and dataset download]

Assignment 3: Sequence Modeling and Parsing [code and dataset on Canvas]

Midterm: topics and practice questions

Assignment 4: Character Language Modeling with RNNs [code and dataset download]

Final Project

Readings: Textbook readings are assigned to complement the material discussed in lecture. You may find it useful to do these readings before lecture as preparation or after lecture to review. Paper readings are intended to supplement the course material if you are interested in diving deeper on particular topics.

The chief text in this course is Eisenstein: Natural Language Processing, available as a free PDF online. For deep learning techniques, this text will be supplemented with selections from Goldberg: A Primer on Neural Network Models for Natural Language Processing. (Another generally useful NLP book is Jurafsky and Martin: Speech and Language Processing (3rd ed. draft), with many draft chapters available for free online; however, we will not be using it for this course.)

Readings for future lectures are tentative and subject to change.

Date Topics Readings Assignments

Jan 22 Introduction [4pp] A0 out

Jan 24 Classification I: Features, Naive Bayes Eisenstein 2.0 (= intro to ch 2), 2.1, 4.1, 4.3 A0 due Friday / A1 out

Jan 29 Classification II: Perceptron, Logistic Regression Eisenstein 2.2, 2.4, Pang+ Thumbs Up, Wang+Manning

Jan 31 Classification III: Multiclass Eisenstein 2.4.1, 2.5, 4.2, Schwartz+ Authorship

Feb 5 Neural I: Feedforward [4pp] Eisenstein 3.0-3.3, Goldberg 3-4, ffnn_example.py

Feb 7 Neural II: Implementation, Word embeddings [4pp] Eisenstein 3.3, Goldberg 6, ffnn_example.py A1 due / A2 out

Feb 12 Neural III: Word embeddings, NNs for NLP [4pp] Eisenstein 14.5-14.6, Goldberg 5, word2vec, GloVe, NLP with FFNNs, DANs

Feb 14 Sequence I: Tagging, POS, HMMs Eisenstein 7.1-7.4, 8.1, Manning POS

Feb 19 Sequence II: HMMs, Viterbi, Beam Search Eisenstein 7.3-7.4, Viterbi lecture note

Feb 21 Sequence III: CRFs, NER / Trees I: Grammar [4pp] Eisenstein 7.5-7.6, 10.1-10.2 A2 due / A3 out

Feb 26 Trees II: PCFGs, CKY Eisenstein 10.3, 10.4.1

Feb 28 Trees III: Better grammars, Dependency I [4pp] Eisenstein 10.5, 11.1, Unlexicalized parsing

Mar 5 Trees IV: Dependency II Eisenstein 11.3-4

Mar 7 Information Extraction [4pp]

Mar 12 LM I: Ngrams / Midterm review Eisenstein 6.1-6.2 A3 due Monday 3/11

Mar 14 MIDTERM (in-class)

Mar 19 NO CLASS (SPRING BREAK)

Mar 21 NO CLASS (SPRING BREAK)

Mar 26 LM II: LSTMs Eisenstein 6.3-6.5, Olah Understanding LSTMs A4 out

Mar 28 LM III: Impl / MT I: Intro [4pp] Eisenstein 18.1-18.2, Karpathy Visualizing RNNs

April 2 MT II: Phrase-based Eisenstein 18.2, 18.4, Pharaoh

April 4 MT III: Decoding, Seq2seq [4pp] Eisenstein 18.3

April 9 MT IV: Seq2seq (cont'd), attention [4pp] Eisenstein 18.3, Attention A4 due / FP out

April 11 DIAL I: Chatbots [4pp] Eisenstein 19.3.3, Diversity, PersonaChat, Alexa Team Gunrock FP proposal due Friday 4/12

April 16 DIAL II: Task-oriented [4pp] Eisenstein 19.3.1-2

April 18 Neural IV: Transfer Learning [4pp] ELMo

April 23 Neural V: Transformers [4pp] BERT, GPT, Transformers, Illustrated Transformer

April 25 QA I: Semantic representations Eisenstein 12, Freebase QA

April 30 QA II: Semantic parsing [4pp] Zettlemoyer, Jia

May 2 QA III: Reading comprehension Eisenstein 17.5, Stanford Attentive Reader, SQuAD, BiDAF

May 7 Multilingual Methods [4pp]

May 9 Wrapup + Ethics [4pp]

Date	Topics	Readings	Assignments
Jan 22	Introduction [4pp]		A0 out
Jan 24	Classification I: Features, Naive Bayes	Eisenstein 2.0 (= intro to ch 2), 2.1, 4.1, 4.3	A0 due Friday / A1 out
Jan 29	Classification II: Perceptron, Logistic Regression	Eisenstein 2.2, 2.4, Pang+ Thumbs Up, Wang+Manning
Jan 31	Classification III: Multiclass	Eisenstein 2.4.1, 2.5, 4.2, Schwartz+ Authorship
Feb 5	Neural I: Feedforward [4pp]	Eisenstein 3.0-3.3, Goldberg 3-4, ffnn_example.py
Feb 7	Neural II: Implementation, Word embeddings [4pp]	Eisenstein 3.3, Goldberg 6, ffnn_example.py	A1 due / A2 out
Feb 12	Neural III: Word embeddings, NNs for NLP [4pp]	Eisenstein 14.5-14.6, Goldberg 5, word2vec, GloVe, NLP with FFNNs, DANs
Feb 14	Sequence I: Tagging, POS, HMMs	Eisenstein 7.1-7.4, 8.1, Manning POS
Feb 19	Sequence II: HMMs, Viterbi, Beam Search	Eisenstein 7.3-7.4, Viterbi lecture note
Feb 21	Sequence III: CRFs, NER / Trees I: Grammar [4pp]	Eisenstein 7.5-7.6, 10.1-10.2	A2 due / A3 out
Feb 26	Trees II: PCFGs, CKY	Eisenstein 10.3, 10.4.1
Feb 28	Trees III: Better grammars, Dependency I [4pp]	Eisenstein 10.5, 11.1, Unlexicalized parsing
Mar 5	Trees IV: Dependency II	Eisenstein 11.3-4
Mar 7	Information Extraction [4pp]
Mar 12	LM I: Ngrams / Midterm review	Eisenstein 6.1-6.2	A3 due Monday 3/11
Mar 14	MIDTERM (in-class)
Mar 19	NO CLASS (SPRING BREAK)
Mar 21	NO CLASS (SPRING BREAK)
Mar 26	LM II: LSTMs	Eisenstein 6.3-6.5, Olah Understanding LSTMs	A4 out
Mar 28	LM III: Impl / MT I: Intro [4pp]	Eisenstein 18.1-18.2, Karpathy Visualizing RNNs
April 2	MT II: Phrase-based	Eisenstein 18.2, 18.4, Pharaoh
April 4	MT III: Decoding, Seq2seq [4pp]	Eisenstein 18.3
April 9	MT IV: Seq2seq (cont'd), attention [4pp]	Eisenstein 18.3, Attention	A4 due / FP out
April 11	DIAL I: Chatbots [4pp]	Eisenstein 19.3.3, Diversity, PersonaChat, Alexa Team Gunrock	FP proposal due Friday 4/12
April 16	DIAL II: Task-oriented [4pp]	Eisenstein 19.3.1-2
April 18	Neural IV: Transfer Learning [4pp]	ELMo
April 23	Neural V: Transformers [4pp]	BERT, GPT, Transformers, Illustrated Transformer
April 25	QA I: Semantic representations	Eisenstein 12, Freebase QA
April 30	QA II: Semantic parsing [4pp]	Zettlemoyer, Jia
May 2	QA III: Reading comprehension	Eisenstein 17.5, Stanford Attentive Reader, SQuAD, BiDAF
May 7	Multilingual Methods [4pp]
May 9	Wrapup + Ethics [4pp]