Syllabus for CS378: Natural Language Processing

Instructor: Greg Durrett, gdurrett@cs.utexas.edu
Lecture: Tuesday and Thursday 9:30am - 11:00am, held on Zoom
TAs: Tanya Goyal (tanya@cs.utexas.edu), Shivang Singh (proctor)
See main page for office hours

Description

For Fall 2020, all components of this course (lectures, office hours, and more) are held entirely online.

Natural language processing (NLP) is a subfield of AI focused on solving problems that involve dealing with human language in a sophisticated way: these include information extraction, machine translation, automatic summarization, conversational dialogue, syntactic analysis, and many others. Much of the progress on these problems over the last 25 years has been driven by statistical machine learning and, more recently, deep learning. One distinctive feature of language compared to other types of data is its structured nature: modeling language involves understanding the linguistic phenomena it exhibits and grappling with it as a sequentially-structured, tree-structured, or graph-structured entity.

This class is intended to be a survey of modern NLP in two respects. First, it covers the main applications of NLP techniques today, both in academia and in industry, as well as enough linguistics to put these problems in context and understand their challenges. Second, it covers a range of models in structured prediction and deep learning including classifiers, sequence models, statistical parsers, neural network encoders, and encoder-decoder models. We study the models themselves, examples of problems they are applied to, inference methods, parameter estimation, and optimization. Programming assignments involve building scalable machine learning systems for various NLP tasks and seeing how these models can be put into practice.

Prerequisites

Lectures

Lectures are 9:30-11:00am Tuesday and Thursday held remotely on Zoom. A complete schedule of lectures and assignments, complete with readings, is on the main website page. The Zoom lectures will be recorded and made available later for students in the class to watch. We will not distribute these recorded lectures to anyone outside of the class, and you should not either, for privacy and copyright reasons.

Class Recordings: Class recordings are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction by a student could lead to Student Misconduct proceedings.

Prerecorded videos will also be used to supplement the lectures, in order to enable the class time to focus more on interactive problem solving and question answering.

Coursework

The timeline of assignments is on the course calendar. Assignment specifications, code, and data will be made available on the course website and Canvas. Grading breakdowns are as follows:

Religious Holy Days: A student who is absent from an examination or cannot meet an assignment deadline due to the observance of a religious holy day may take the exam on an alternate day or submit the assignment up to 24 hours late without penalty, if proper notice of the planned absence has been given. Notice must be given at least 14 days prior to the classes which will be missed. For religious holy days that fall within the first 2 weeks of the semester, notice should be given on the first day of the semester. Notice should be personally delivered to the instructor and signed and dated by the instructor, or emailed, in which case a student submitting email notification must receive email confirmation from the instructor.

Other Extensions: Extensions may be granted in cases of medical emergency or other circumstances. In all cases, the student should inform the course staff as soon as is practical, and the extension must be negotiated before the assignment's original due date.

Midterm Extensions: Any conflict with the take-home midterm exam should be brought up with the course staff as soon as possible. Extensions will typically not be granted for personal travel.

In-class Exercises

These exercises will be conducted via UT Instapoll. The time limit will be set to the maximum allowable (1 week) and the prompt will be given to you, so you do not need to be in lecture to complete these. Some will be graded on participation and some on correctness. We will compute your score excluding your 5 lowest poll responses.

Assignments

The assignments will feature a combination of written question and coding assignments with various scope. Detailed instructions for assignment completion and submission are given with each assignment.

Submission: Assignments will be submitted via Gradescope. Coding portions of assignments will be autograded, and written portions will be assessed by course staff and returned with feedback.

Slip Days: Each student is given 5 slip days to use throughout the term. Any number of these days can be applied to any assignment to extend the deadline for that assignment by that many days. E.g., you can turn in Assignment 1 one day late and Assignment 4 one day late, using two slip days total. Slip days can only be used for assignments and not the midterm or final project. Slip days cannot be used fractionally: submitting an assignment 1 hour late incurs 1 slip day, 25 hours late incurs 2 slip days, etc.

Late Assignments: For each day late an assignment is turned in not covered by a slip day or negotiated extension (listed above), 15% of the credit for that assignment will be deducted. So, an assignment turned in two days late will automatically lose 30%.

Midterm

There will be one take-home midterm as described on the course calendar. This exam will be open-book; however, you must do it individually, not consulting with other students.

Final Project

The final project is either an in-depth exploration of question answering or an opportunity for more open-ended exploration of concepts in the course. Both options can be completed individually or in groups of 2; working in groups is encouraged! If you wish to pursue your own independent project, your group must write a brief 1-page proposal describing what you plan to do and how you plan to do it, which the course staff will provide feedback on. Note that this proposal is due in advance of the final project release. Independent projects do not necessarily have to "work," but will be held to a high standard in terms of expected effort, insight, and technical sophistication.

Final Grades

Your final grade is computed based on the total points earned across all assignments. The final grade is mapped to a letter as follows, with grades on the boundary receiving the higher grade:

A 100 - 93.3
A- 93.3 - 90.0
B+ 90.0 - 86.6
B 86.6 - 83.3
B- 83.3 - 80.0
C+ 80.0 - 76.6
C 76.6 - 73.3
C- 73.3 - 70.0
D 70 - 65
F below 65

Depending on class performance, the instructors may shift these boundaries down to raise students' grades.

Academic Honesty

Please read the department's academic honesty policies. For this course, students are encouraged to discuss lecture material, homework problems, and coding assignments with others! However, your final written solution or source code must be your own, excluding the final project, which may be completed in groups. The take-home exam must be completed independently by each student. Finally, note that you may consult external resources such as blog posts, YouTube videos, academic papers, GitHub repositories, and more. However, your use of such resources, particularly GitHub repositories, must be limited in the same way as discussions with other students: you can look at these to get an idea of how to solve a problem, but you should not take external code and submit it as part of your assignment, except for the final project when it is appropriately attributed.

Be sure you respect these policies when posting on Piazza. Asking clarifying questions, addressing possible bugs in the provided code, etc. are fair game, but you should discuss solutions in a substantive way that might spoil them for others. When in doubt, and when posting large amounts of source code, post privately to the instructors.

Students who violate these policies may receive a failing grade on the assignment in question or for the course overall, depending on the instructors' judgment and the severity of the infraction.

Miscellaneous

Disabilities: Students with disabilities may request appropriate academic accommodations from the Division of Diversity and Community Engagement, Services for Students with Disabilities at 512-471-6259.

Diversity: It is our intent that students from all diverse backgrounds and perspectives be well served by this course, that students' learning needs be addressed both in and out of class, and that the diversity that students bring to this class be viewed as a resource, strength and benefit. It is our intent to present materials and activities that are respectful of diversity: gender, sexuality, disability, age, socioeconomic status, ethnicity, race, and culture. Your suggestions are encouraged and appreciated. Please let the course staff know of ways to improve the effectiveness of the course for you personally or for other students.

Furthermore, at times throughout the semester, we will discuss the broader cultural impact of machine learning, NLP, and language technology. I ask that students approach these topics seriously and recognize the power technology has to both support and undermine efforts to create a more inclusive society.