CS381V: Visual Recognition, Fall 2025

Meets: Thurs 2-5 pm in CBA 4.348

Instructor:
Kristen Grauman
Office: GDC 4.726
Office hours: by appointment (send email)

TA:
Joungbin An,
Arjun Somayazulu
Office: GDC 4S vision lab
Office hours: by appointment (send email)

Requirements

Piazza for accessing assignments, submitting reviews, assignment questions (Please enroll yourselves).

Canvas for grades.

Course overview

This is an advanced graduate seminar course in computer vision. We will survey and discuss current papers relating to high-level visual understanding—objects, scenes, activities, and multimodal learning— with an emphasis on new problems in video.

The goals of the course are to understand what are the important problems, how are they being approached, and how well do things work today. We will actively analyze strengths and weaknesses, and strive to identify interesting open questions and directions for future research.

The class meets in person and will consist of student presentations about papers, discussion, and intermittent implementation working sessions. Outside class sessions, students will gain hands-on experience via assignments (done in pairs) and a final project (done in a small group).

Auditing the course: Due to the format of the course and classroom, unfortunately we are not able to accommodate auditing. The class sessions are for registered students only.

Requirements

Important details on all the requirements and grading breakdown are here: Requirements .

Prereqs

Courses in computer vision and machine learning (CS 376 / CS 378H Computer Vision and/or CS 391 Machine Learning and/or CS 395T Deep Learning, or similar); ability to understand and analyze conference papers in this area; programming required for assignments and final project.

Please talk to me if you are unsure if the course is a good match for your background. I generally recommend scanning through papers on the syllabus to gauge what kind of background is expected. I don't assume you are already familiar with every single algorithm/tool/feature a given paper mentions, but you should feel comfortable following the key ideas.

Topics

1. Fundamental toolbox

Image reps and object rec and seg
3D scenes
Video reps and activity recognition
People – pose, hands, face

2. Activity understanding

World models and generation
Procedural activities
Video QA, captioning, reasoning
Egocentric video and first-person perception

3. Next frontier challenges

Long-form video
Touch & sound
Skill assessment & AI coaching
Vision for science

Details here on Schedule.