Syllabus for CS378: Natural Language Processing

Instructor: Greg Durrett, gdurrett@cs.utexas.edu
Lecture: Tuesday and Thursday 11am - 12:15pm
TAs: Xi Ye (xiye@cs.utexas.edu), Lokesh Pugalenthi (lokeshpugalenthi@utexas.edu)
See main page for office hours

Description

Natural language processing (NLP) is a subfield of AI focused on solving problems that involve dealing with human language in a sophisticated way: these include information extraction, machine translation, automatic summarization, conversational dialogue, syntactic analysis, and many others. Much of the progress on these problems over the last 25 years has been driven by statistical machine learning and, more recently, deep learning. One distinctive feature of language compared to other types of data is its structured nature: modeling language involves understanding the linguistic phenomena it exhibits and grappling with it as a sequentially-structured, tree-structured, or graph-structured entity.

This class is intended to be a survey of modern NLP in two respects. First, it covers the main applications of NLP techniques today, both in academia and in industry, as well as enough linguistics to put these problems in context and understand their challenges. Second, it covers a range of models in structured prediction and deep learning including classifiers, sequence models, statistical parsers, neural network encoders, and encoder-decoder models. We study the models themselves, examples of problems they are applied to, inference methods, parameter estimation, and optimization. Programming assignments involve building scalable machine learning systems for various NLP tasks and seeing how these models can be put into practice.

Prerequisites

Lectures

Lectures are 11:00am-12:15pm Tuesday and Thursday. A complete schedule of lectures and assignments, complete with readings, is on the main website page.

All lectures will take in-person in JGB 2.216. Recordings via LecturesOnline will be made available for students to watch later. Prerecorded videos will also be used to supplement the lectures, in order to enable the class time to focus more on interactive problem solving and question answering.

COVID-19: When attending in person, students are strongly encouraged to wear masks and be vaccinated against COVID-19. Vaccines greatly reduce the likelihood of getting an infection and are particularly effective at preventing severe cases of COVID-19. However, they are not perfectly effective, and mask-wearing additionally reduces the risk of COVID-19 transmission. If you become sick with COVID-19 or any other ailment and are unable to attend class, please contact the instructor if you need accommodation and we will work to support you.

Class Recordings: Class recordings are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction by a student could lead to Student Misconduct proceedings.

Office Hours: Office hours will be held in a mix of in-person and on Zoom, per the discretion of the course staff. Information will be posted on the main course page at the start of the semester.

Discussions: We will use edstem as our discussion board; this tool offers very similar functionality as Piazza. You can access this from Canvas under "Ed Discussion".

Coursework

The timeline of assignments is on the course calendar. Assignment specifications, code, and data will be made available on the course website and Canvas. Grading breakdowns are as follows:

Religious Holy Days: A student who is absent from an examination or cannot meet an assignment deadline due to the observance of a religious holy day may take the exam on an alternate day or submit the assignment up to 24 hours late without penalty, if proper notice of the planned absence has been given. Notice must be given at least 14 days prior to the classes which will be missed. For religious holy days that fall within the first 2 weeks of the semester, notice should be given on the first day of the semester. Notice should be personally delivered to the instructor and signed and dated by the instructor, or emailed, in which case a student submitting email notification must receive email confirmation from the instructor.

Illness and Medical Extensions: Extensions may be granted in cases of illness (including COVID-19), medical emergency, or other circumstances. In all cases, the student should inform the course staff as soon as is practical, and the extension must be negotiated before the assignment's original due date.

Midterm Extensions: Any conflict with the take-home midterm exam should be brought up with the course staff as soon as possible. Extensions will typically not be granted for personal travel.

Assignments

The assignments will feature a combination of written question and coding assignments with various scope. Detailed instructions for assignment completion and submission are given with each assignment.

Submission: Assignments will be submitted via Gradescope. Coding portions of assignments will be autograded, and written portions will be assessed by course staff and returned with feedback.

Slip Days: Each student is given 5 slip days to use throughout the term. Any number of these days can be applied to any assignment to extend the deadline for that assignment by that many days. E.g., you can turn in Assignment 1 one day late and Assignment 4 one day late, using two slip days total. Slip days can only be used for assignments and not the midterm or final project. Slip days cannot be used fractionally: submitting an assignment 1 hour late incurs 1 slip day, 25 hours late incurs 2 slip days, etc.

Late Assignments: For each day late an assignment is turned in not covered by a slip day or negotiated extension (listed above), 5% of the credit for that assignment will be deducted. So, an assignment turned in two days late will automatically lose 10%.

Responses

There are two types of responses in this class: in-class exercises and social impact responses.

In-class exercises will be conducted via UT Instapoll. The time limit will be set to the maximum allowable (1 week) and the prompt will be given to you, so you do not need to be in lecture to complete these. Some will be graded on participation and some on correctness. We will compute your score excluding your 5 lowest poll responses.

Social impact responses will involve writing out a 1-2 paragraph response to a prompt discussed in lecture. These will be graded based on the substantiveness of the response. These responses cannot be dropped.

Midterm

There will be one in-class midterm as described on the course calendar. Students will be allowed one standard letter (8.5" x 11") page of notes during exams. Use of electronic communication devices (phones, laptops, calculators, etc.) is banned during the exam.

Final Project

The final project consists of either an in-depth exploration of dataset bias or a project on a topic of your choosing. Both options can be completed individually or in groups of 2; working in groups is encouraged! If you wish to pursue your own project idea (either solo or as a team), your group must write a brief 1-page proposal describing what you plan to do and how you plan to do it, which the course staff will provide feedback on. Note that this proposal is due in advance of the final project release. Independent projects do not necessarily have to "work," but will be held to a high standard in terms of expected effort, insight, and technical sophistication.

Final Grades

Your final grade is computed based on the total points earned across all assignments. The final grade is mapped to a letter as follows, with grades on the boundary receiving the higher grade:

A 100 - 93.3
A- 93.3 - 90.0
B+ 90.0 - 86.6
B 86.6 - 83.3
B- 83.3 - 80.0
C+ 80.0 - 76.6
C 76.6 - 73.3
C- 73.3 - 70.0
D 70 - 65
F below 65

Depending on class performance, the instructors may shift these boundaries down to raise students' grades.

Academic Honesty

Please read the department's academic honesty policies. For this course, students are encouraged to discuss lecture material, homework problems, and coding assignments with others! However, your final written solution or source code must be your own, excluding the final project, which may be completed in groups. The take-home exam must be completed independently by each student. Finally, note that you may consult external resources such as blog posts, YouTube videos, academic papers, GitHub repositories, and more. However, your use of such resources, particularly GitHub repositories, must be limited in the same way as discussions with other students: you can look at these to get an idea of how to solve a problem, but you should not take external code and submit it as part of your assignment, except for the final project when it is appropriately attributed.

Be sure you respect these policies when posting on the discussion board. Asking clarifying questions, addressing possible bugs in the provided code, etc. are fair game, but you should not discuss solutions in a substantive way that might spoil them for others. When in doubt, and when posting large amounts of source code, post privately to the instructors.

Students who violate these policies may receive a failing grade on the assignment in question or for the course overall, depending on the instructors' judgment and the severity of the infraction.

Miscellaneous

Disabilities: The university is committed to creating an accessible and inclusive learning environment consistent with university policy and federal and state law. Students with disabilities may request appropriate academic accommodations from the Division of Diversity and Community Engagement, Services for Students with Disabilities at 512-471-6259. If you are already registered with SSD, please deliver your Accommodation Letter to me as early as possible in the semester so we can discuss your approved accommodations and needs in this course.

Diversity: It is our intent that students from all diverse backgrounds and perspectives be well served by this course, that students' learning needs be addressed both in and out of class, and that the diversity that students bring to this class be viewed as a resource, strength and benefit. It is our intent to present materials and activities that are respectful of diversity: gender, sexuality, disability, age, socioeconomic status, ethnicity, race, and culture. Your suggestions are encouraged and appreciated. Please let the course staff know of ways to improve the effectiveness of the course for you personally or for other students.

Furthermore, at times throughout the semester, we will discuss the broader cultural impact of machine learning, NLP, and language technology. I ask that students approach these topics seriously and recognize the power technology has to both support and undermine efforts to create a more inclusive society.

Personal Pronouns: Professional courtesy and sensitivity are especially important with respect to individuals and topics dealing with differences of race, culture, religion, politics, sexual orientation, gender, gender variance, and nationalities. Class rosters are provided to the instructor with the student's legal name, unless they have added a "preferred name" with the Gender and Sexuality Center. I will gladly honor your request to address you by a name that is different from what appears on the official roster, and by the gender pronouns you use. Please advise me of any changes early in the semester so that I may make appropriate updates to my records. For instructions on how to add your pronouns to Canvas, visit https://utexas.instructure.com/courses/633028/pages/profile-pronouns.