CS395T: Visual Recognition, Fall 2011



Course overview        Useful links        Syllabus        Detailed schedule          Blackboard


Meets:
Wednesdays 4:00-7:00 pm
ACES 3.408

Instructor: Kristen Grauman 
Email: grauman@cs
Office: ACES 3.446 

Office hours: by appointment

When emailing me, please put CS395 in the subject line.

Announcements:

See the schedule for weekly reading assignments.

Project paper drafts due Nov 23.  Details on projects are here.

Course overview:


Topics: This is a graduate seminar course in computer vision.   We will survey and discuss current vision papers relating to object recognition, auto-annotation of images, and scene understanding.  The goals of the course will be to understand current approaches to some important problems, to actively analyze their strengths and weaknesses, and to identify interesting open questions and possible directions for future research.

See the syllabus for an outline of the main topics we'll be covering.

Requirements: Students will be responsible for writing paper reviews each week, participating in discussions, completing one programming assignment, presenting once or twice in class (depending on enrollment), and completing a project (done in pairs). 

Note that presentations are due one week before the slot your presentation is scheduled.  This means you will need to read the papers, prepare experiments, create slides, etc. more than one week before the date you are signed up for.  The idea is to meet and discuss ahead of time, so that we can iterate as needed the week leading up to your presentation. 

More details on the requirements and grading breakdown are here.

Prereqs:  Courses in computer vision and/or machine learning (378/376 Computer Vision and/or 391 Machine Learning, or similar); ability to understand and analyze conference papers in this area; programming required for experiment presentations and projects. 

Please talk to me if you are unsure if the course is a good match for your background.  I generally recommend scanning through a few papers on the syllabus to gauge what kind of background is expected.  I don't assume you are already familiar with every single algorithm/tool/image feature a given paper mentions, but you should feel comfortable following the key ideas.



Syllabus overview:

  1. Single-object recognition fundamentals: representation, matching, and classification
    1. Specific objects
    2. Classification and global models
    3. Regions and mid-level representations
  2. Beyond single objects: scenes and properties
    1. Context and scenes
    2. Saliency, importance, attention
    3. Attributes
  3. External input in recognition
    1. Language and text
    2. Interactive learning and recognition
  4. Activity in images and videos
    1. Pictures of people
    2. Activity recognition
  5. Dealing with lots of data/categories
    1. Scaling with a large number of categories
    2. Large-scale search and mining
    3. Automatic summarization

Important dates:


Schedule and papers:


Note:  * = required reading. 
Additional papers are provided for reference, and as a starting point for background reading for projects.
Paper presentations: focus on starred papers (additionally mentioning ideas from others is ok but not necessary).
Experiment presentations: Pick from only among the starred papers.
Date
Topics
Papers and links
Presenters
Items due
Aug 24
Course intro 

[slides]
Topic preferences due via email by Monday August 29
I. Single-object recognition fundamentals: representation, matching, and classification
Aug 31
Recognizing specific objects:

Invariant local features, instance recognition, bag-of-words models

sift
  • *Object Recognition from Local Scale-Invariant Features, Lowe, ICCV 1999.  [pdf]  [code] [other implementations of SIFT] [IJCV]

  • *Local Invariant Feature Detectors: A Survey, Tuytelaars and Mikolajczyk.  Foundations and Trends in Computer Graphics and Vision, 2008. [pdf]  [Oxford code] [Read pp. 178-188, 216-220, 254-255]

  • *Video Google: A Text Retrieval Approach to Object Matching in Videos, Sivic and Zisserman, ICCV 2003.  [pdf]  [demo]


  • For more background on feature extraction: Szeliski book: Sec 3.2 Linear filtering, 4.1 Points and patches, 4.2 Edges

  • Scalable Recognition with a Vocabulary Tree, D. Nister and H. Stewenius, CVPR 2006. [pdf]

  • SURF: Speeded Up Robust Features, Bay, Ess, Tuytelaars, and Van Gool, CVIU 2008.  [pdf] [code]

  • Bundling Features for Large Scale Partial-Duplicate Web Image Search.  Z. Wu, Q. Ke, M. Isard, and J. Sun.  CVPR 2009.  [pdf]

  • Robust Wide Baseline Stereo from Maximally Stable Extremal Regions, J. Matas, O. Chum, U. Martin, and T. Pajdla, BMVC 2002.  [pdf]

  • City-Scale Location Recognition, G. Schindler, M. Brown, and R. Szeliski, CVPR 2007.  [pdf

  • Object Retrieval with Large Vocabularies and Fast Spatial Matching.  J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, CVPR 2007.  [pdf]

  • I Know What You Did Last Summer: Object-Level Auto-annotation of Holiday Snaps, S. Gammeter, L. Bossard, T.Quack, L. van Gool, ICCV 2009.  [pdf]

  • Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval.  O. Chum et al. CVPR 2007.  [pdf]

  • A Performance Evaluation of Local Descriptors. K. Mikolajczyk and C. Schmid.  CVPR 2003 [pdf]


[slides]

Sept 7
Recognition via classification and global models:

Global appearance models for category and scene recognition, sliding window detection, detection as a binary decision.

hog
  • *A Discriminatively Trained, Multiscale, Deformable Part Model, by P. Felzenszwalb,  D.  McAllester and D. Ramanan.   CVPR 2008.  [pdf]  [code

  • *Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, Lazebnik, Schmid, and Ponce, CVPR 2006. [pdf]  [15 scenes dataset]  [libpmk] [Matlab]

  • *Rapid Object Detection Using a Boosted Cascade of Simple Features, Viola and Jones, CVPR 2001.  [pdf]  [code]


  • Histograms of Oriented Gradients for Human Detection, Dalal and Triggs, CVPR 2005.  [pdf]  [video] [code] [PASCAL datasets]

  • Modeling the Shape of the Scene: a Holistic Representation of the Spatial Envelope, Oliva and Torralba, IJCV 2001.  [pdf]  [Gist code

  • Locality-Constrained Linear Coding for Image Classification.  J. Wang, J. Yang, K. Yu,  and T. Huang  CVPR 2010. [pdf] [code]

  • Visual Categorization with Bags of Keypoints, C. Dance, J. Willamowski, L. Fan, C. Bray, and G. Csurka, ECCV International Workshop on Statistical Learning in Computer Vision, 2004.  [pdf]

  • Pedestrian Detection in Crowded Scenes, Leibe, Seemann, and Schiele, CVPR 2005.  [pdf]

  • Pyramids of Histograms of Oriented Gradients (pHOG), Bosch and Zisserman. [code]

  • Eigenfaces for Recognition, Turk and Pentland, 1991.  [pdf]

  • Sampling Strategies for Bag-of-Features Image Classification.  E. Nowak, F. Jurie, and B. Triggs.  ECCV 2006. [pdf]

  • Beyond Sliding Windows: Object Localization by Efficient Subwindow Search.  C. Lampert, M. Blaschko, and T. Hofmann.  CVPR 2008.  [pdf]  [code]

  • A Trainable System for Object Detection, C. Papageorgiou and T. Poggio, IJCV 2000.  [pdf]

  • Object Recognition with Features Inspired by Visual Cortex. T. Serre, L. Wolf and T. Poggio. CVPR 2005.  [pdf]


[slides]

Sept 14
Regions and mid-level representations

Segmentation, grouping, surface estimation


regions

geocontext
  • *Constrained Parametric Min-Cuts for Automatic Object Segmentation. J. Carreira and C. Sminchisescu. CVPR 2010.  [pdf] [code]

  • *Geometric Context from a Single Image, by D. Hoiem, A. Efros, and M. Hebert, ICCV 2005. [pdf]  [web]  [code]

  • *Contour Detection and Hierarchical Image Segmentation.  P. Arbelaez,  M. Maire, C. Fowlkes, and J. Malik. PAMI 2011.  [pdf] [data and code]


  • From Contours to Regions: An Empirical Evaluation.  P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik.  CVPR 2009.  [pdf] [code]

  • Boundary-Preserving Dense Local Regions.  J. Kim and K. Grauman.  CVPR 2011.  [pdf]  [code]

  • Object Recognition as Ranking Holistic Figure-Ground Hypotheses. F. Li, J. Carreira, and C. Sminchisescu. CVPR 2010. [pdf]

  • Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman.  CVPR 2006.  [pdf] [code]

  • Combining Top-down and Bottom-up Segmentation. E. Borenstein, E. Sharon, and S. Ullman.  CVPR  workshop 2004.  [pdf]  [data]

  • Efficient Region Search for Object Detection.  S. Vijayanarasimhan and K. Grauman. CVPR 2011.  [pdf] [code] [data]

  • Extracting Subimages of an Unknown Category from a Set of Images, S. Todorovic and N. Ahuja, CVPR 2006.  [pdf]

  • Learning Mid-level Features for Recognition. Y.-L. Boureau, F. Bach, Y. LeCun, and J. Ponce. CVPR, 2010. 

  • Class-Specific, Top-Down Segmentation, E. Borenstein and S. Ullman, ECCV 2002.  [pdf]

  • Object Recognition by Integrating Multiple Image Segmentations, C. Pantofaru, C. Schmid, and M. Hebert, ECCV 2008  [pdf]

  • Image Parsing: Unifying Segmentation, Detection, and Recognition. Tu, Z., Chen, Z., Yuille, A.L., Zhu, S.C. ICCV 2003  [pdf]

  • GrabCut -Interactive Foreground Extraction using Iterated Graph Cuts, by C. Rother, V. Kolmogorov, A. Blake, SIGGRAPH 2004.  [pdf]  [project page]

  • Recognition Using Regions.  C. Gu, J. Lim, P. Arbelaez, J. Malik, CVPR 2009.  [pdf] [code]

  • Robust Higher Order Potentials for Enforcing Label Consistency, P. Kohli, L. Ladicky, and P. Torr. CVPR 2008.  

  • Co-segmentation of Image Pairs by Histogram Matching --Incorporating a Global Constraint into MRFs, C. Rother, V. Kolmogorov, T. Minka, and A. Blake.  CVPR 2006.  [pdf]

  • Collect-Cut: Segmentation with Top-Down Cues Discovered in Multi-Object Images.  Y. J. Lee and K. Grauman. CVPR 2010.  [pdf] [data]

  • An Efficient Algorithm for Co-segmentation, D. Hochbaum, V. Singh, ICCV 2009.  [pdf]

  • Normalized Cuts and Image Segmentation, J. Shi and J. Malik.  PAMI 2000.  [pdf]  [code]


  • Greg Mori's superpixel code
  • Berkeley Segmentation Dataset and code
  • Pedro Felzenszwalb's graph-based segmentation code
  • Michael Maire's segmentation code and paper
  • Mean-shift: a Robust Approach Towards Feature Space Analysis [pdf]  [code, Matlab interface by Shai Bagon]
  • David Blei's Topic modeling code
[slides]
Expts: Brian, Cho-Jui
Implementation assignment due Friday Sept 16, 5 PM
II. Beyond single objects: scenes and properties
Sept 21
Context and scenes

Multi-object scenes, inter-object relationships, understanding scenes' spatial layout, 3d context

context
  • *Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces.  D. Lee, A. Gupta, M. Hebert, and T. Kanade.  NIPS 2010.  [pdf] [code]

  • *Multi-Class Segmentation with Relative Location Prior.  S. Gould, J. Rodgers, D. Cohen, G. Elidan and D.  Koller.  IJCV 2008. [pdf] [code]

  • *Using the Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization.  Torralba, Murphy, and Freeman.  CACM 2009.  [pdf] [related code]


  • Contextual Priming for Object Detection, A. Torralba.  IJCV 2003.  [pdf] [web] [code]

  • TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation.  J. Shotton, J. Winn, C. Rother, A. Criminisi.  ECCV 2006.  [pdf] [web] [data] [code]

  • Recognition Using Visual Phrases.  M. Sadeghi and A. Farhadi.  CVPR 2011.  [pdf]

  • Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry.  V. Hedau, D. Hoiem, and D. Forsyth.  ECCV 2010 [pdf] [code and data]

  • Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics, A. Gupta, A. Efros, and M. Hebert.  ECCV 2010. [pdf]

  • Object-Graphs for Context-Aware Category Discovery.  Y. J. Lee and K. Grauman.  CVPR 2010.  [pdf] [code]

  • Geometric Reasoning for Single Image Structure Recovery.  D. Lee, M. Hebert, and T. Kanade.  CVPR 2009.  [pdf]  [web[code]

  • Putting Objects in Perspective, by D. Hoiem, A. Efros, and M. Hebert, CVPR 2006.  [pdf] [web]

  • Discriminative Models for Multi-Class Object Layout, C. Desai, D. Ramanan, C. Fowlkes. ICCV 2009.  [pdf]  [slides]  [SVM struct code] [data]

  • Closing the Loop in Scene Interpretation.  D. Hoiem, A. Efros, and M. Hebert.  CVPR 2008.  [pdf]

  • Decomposing a Scene into Geometric and Semantically Consistent Regions, S. Gould, R. Fulton, and D. Koller, ICCV 2009.  [pdf]  [slides]

  • Learning Spatial Context: Using Stuff to Find Things, by G. Heitz and D. Koller, ECCV 2008.  [pdf] [code]

  • An Empirical Study of Context in Object Detection, S. Divvala, D. Hoiem, J. Hays, A. Efros, M. Hebert, CVPR 2009.  [pdf]  [web]

  • Object Categorization using Co-Occurrence, Location and Appearance, by C. Galleguillos, A. Rabinovich and S. Belongie, CVPR 2008.[ pdf]

  • Context Based Object Categorization: A Critical SurveyC. Galleguillos and S. Belongie.  [pdf]

  • What, Where and Who? Classifying Events by Scene and Object Recognition, L.-J. Li and L. Fei-Fei, ICCV 2007. [pdf]

  • Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Unsupervised Framework, L-J. Li, R. Socher, L. Fei-Fei, CVPR 2009.  [pdf]

Papers: Nishant, Jung
Expts: Saurajit


Sept 28
Saliency and attention

Among all items in the scene, which deserve attention (first)?

saliency
  • *A Model of Saliency-based Visual Attention for Rapid Scene Analysis.  L. Itti, C. Koch, and E. Niebur.  PAMI 1998  [pdf]

  • *Learning to Detect a Salient Object.  T. Liu et al. CVPR 2007.  [pdf]  [results]  [data]  [code by Vicente Ordonez]

  • *Figure-Ground Segmentation Improves Handled Object Recognition in Egocentric Video.  X. Ren and C. Gu.  CVPR 2010 [pdf] [videos] [data]

  • *What Do We Perceive in a Glance of a Real-World Scene?  L. Fei-Fei, A. Iyer, C. Koch, and P. Perona.  Journal of Vision, 2007.  [pdf]


  • Interesting Objects are Visually Salient.  L. Elazary and L. Itti.  Journal of Vision, 8(3):1–15, 2008.  [pdf]

  • Accounting for the Relative Importance of Objects in Image Retrieval.  S. J. Hwang and K. Grauman.  BMVC 2010.  [pdf] [web] [data]

  • Some Objects are More Equal Than Others: Measuring and Predicting Importance, M. Spain and P. Perona.  ECCV 2008.  [pdf]

  • What Makes an Image Memorable?  P. Isola et al. CVPR 2011. [pdf]
  • The Discriminant Center-Surround Hypothesis for Bottom-Up Saliency. D. Gao, V.Mahadevan, and N. Vasconcelos. NIPS, 2007.  [pdf]

  • Category-Independent Object Proposals.  I. Endres and D. Hoiem.  ECCV 2010.  [pdf]  [code]

  • What is an Object?  B. Alexe, T. Deselaers, and V. Ferrari.  CVPR 2010.  [pdf] [code]

  • A Principled Approach to Detecting Surprising Events in Video.  L. Itti and P. Baldi.  CVPR 2005  [pdf]

  • Optimal Scanning for Faster Object Detection,  N. Butko, J. Movellan.  CVPR 2009.  [pdf]

  • What Attributes Guide the Deployment of Visual Attention and How Do They Do It? J. Wolfe and T. Horowitz. Neuroscience, 5:495–501, 2004.  [pdf]

  • Visual Correlates of Fixation Selection: Effects of Scale and Time. B. Tatler, R. Baddeley, and I. Gilchrist. Vision Research, 45:643, 2005.  [pdf]

  • Objects Predict Fixations Better than Early Saliency.  W. Einhauser, M. Spain, and P. Perona. Journal of Vision, 8(14):1–26, 2008.  [pdf]

  • Reading Between the Lines: Object Localization Using Implicit Cues from Image Tags.  S. J. Hwang and K. Grauman.  CVPR 2010.  [pdf]  [data]

  • Peripheral-Foveal Vision for Real-time Object Recognition and Tracking in Video.  S. Gould, J. Arfvidsson, A. Kaehler, B. Sapp, M. Messner, G. Bradski, P. Baumstrack,S. Chung, A. Ng.  IJCAI 2007.  [pdf]

  • Peekaboom: A Game for Locating Objects in Images, by L. von Ahn, R. Liu and M. Blum, CHI 2006. [pdf]  [web]

  • Determining Patch Saliency Using Low-Level Context, D. Parikh, L. Zitnick, and T. Chen. ECCV 2008.  [pdf]

  • Visual Recognition and Detection Under Bounded Computational Resources, S. Vijayanarasimhan and A. Kapoor.  CVPR 2010.

  • Key-Segments for Video Object Segmentation.  Y. J. Lee, J. Kim, and K. Grauman.  ICCV 2011  [pdf]

  • Contextual Guidance of Eye Movements and Attention in Real-World Scenes: The Role of Global Features on Object Search.  A. Torralba, A. Oliva, M. Castelhano, J. Henderson.  [pdf] [web]

  • The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search, G. Zelinsky, W. Zhang, B. Yu, X. Chen, D. Samaras, NIPS 2005.  [pdf]

Papers: Lu Xia
Expts: Larry


Oct 5
Attributes:

Visual properties, learning from natural language descriptions, intermediate representations

attributes
  • *Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, C. Lampert, H. Nickisch, and S. Harmeling, CVPR 2009  [pdf] [web] [data]

  • *Describing Objects by Their Attributes, A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, CVPR 2009.  [pdf]  [web] [data]

  • *Attribute and Simile Classifiers for Face Verification, N. Kumar, A. Berg, P. Belhumeur, S. Nayar.  ICCV 2009.  [pdf] [web] [lfw data] [pubfig data]


  • Relative Attributes.  D. Parikh and K. Grauman.  ICCV 2011.  [pdf]  [data]

  • A Discriminative Latent Model of Object Classes and Attributes.  Y. Wang and G. Mori.  ECCV, 2010.  [pdf]

  • Learning Visual Attributes, V. Ferrari and A. Zisserman, NIPS 2007.  [pdf] 

  • Learning Models for Object Recognition from Natural Language Descriptions, J. Wang, K. Markert, and M. Everingham, BMVC 2009.[pdf]

  • FaceTracer: A Search Engine for Large Collections of Images with Faces.  N. Kumar, P. Belhumeur, and S. Nayar.  ECCV 2008.  [pdf]

  • Attribute-Centric Recognition for Cross-Category Generalization.  A. Farhadi, I. Endres, D. Hoiem.  CVPR 2010.  [pdf]

  • Automatic Attribute Discovery and Characterization from Noisy Web Data.  T. Berg et al.  ECCV 2010.  [pdf]  [data]

  • Attributes-Based People Search in Surveillance Environments.  D. Vaquero, R. Feris, D. Tran, L. Brown, A. Hampapur, and M. Turk.  WACV 2009.  [pdf] [project page]

  • Image Region Entropy: A Measure of "Visualness" of Web Images Associated with One Concept.  K. Yanai and K. Barnard.  ACM MM 2005.  [pdf]

  • What Helps Where And Why? Semantic Relatedness for Knowledge Transfer. M. Rohrbach, M. Stark, G. Szarvas, I. Gurevych and B. Schiele. CVPR 2010.  [pdf]

  • Recognizing Human Actions by Attributes.  J. Liu, B. Kuipers, S. Savarese, CVPR 2011.  [pdf]

  • Interactively Building a Discriminative Vocabulary of Nameable Attributes.  D. Parikh and K. Grauman.  CVPR 2011.  [pdf] [web]

Papers: Saurajit
Expts: Qiming, Harsh
Proposal abstracts due Friday Oct 7, 5 PM
III. External input in recognition
Oct 12
Language and description

Discovering the correspondence between words and other language constructs and images, generating descriptions

caption
  • *Baby Talk: Understanding and Generating Image Descriptions.  Kulkarni et al.  CVPR 2011.  [pdf]

  • *Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers, A. Gupta and L. Davis, ECCV 2008.  [pdf]

  • *Learning Sign Language by Watching TV (using weakly aligned subtitles), P. Buehler, M. Everingham, and A. Zisserman. CVPR 2009.  [pdf]  [data] [web]


  • Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary, P. Duygulu, K. Barnard, N. de Freitas, D. Forsyth. ECCV 2002.  [pdf]  [data]

  • The Mathematics of Statistical Machine Translation: Parameter Estimation.  P. Brown, S. Della Pietro, V. Della Pietra, R. Mercer.  Association for Computational Linguistics, 1993.  [pdf] (background for Duygulu et al paper)

  • How Many Words is a Picture Worth?  Automatic Caption Generation for News Images.  Y. Feng and M. Lapata.  ACL 2010.  [pdf]
  • Matching words and pictures. K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. Blei, and M. Jordan.  JMLR, 3:1107–1135, 2003.  [pdf]

  • Who's Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation.  L. Jie, B. Caputo, and V. Ferrari.  NIPS 2009.  [pdf]

  • Watch, Listen & Learn: Co-training on Captioned Images and Videos.  S. Gupta, J. Kim, K. Grauman, and R. Mooney.  ECML 2008.  [pdf]

  • Systematic Evaluation of Machine Translation Methods for Image and Video Annotation, P. Virga, P. Duygulu, CIVR 2005.  [pdf]
  • Localizing Objects and Actions in Videos Using Accompanying Text.  Johns Hopkins University Summer Workshop Report.  J. Neumann et al.  2010.  [pdf]  [web]
Papers: Chris
Expts: Jae, Naga


Oct 19
Interactive learning and recognition

Human-in-the-loop learning, active annotation collection, crowdsourcing

questions
mturk


  • *Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds.  S. Vijayanarasimhan and K. Grauman.  CVPR 2011.  [pdf]

  • *Visual Recognition with Humans in the Loop.  Branson S., Wah C., Babenko B., Schroff F., Welinder P., Perona P., Belongie S.  ECCV 2010. [pdf]  [Caltech/UCSD Visipedia project]  [data]

  • *The Multidimensional Wisdom of Crowds.  Welinder P., Branson S., Belongie S., Perona, P. NIPS 2010. [pdf]  [code]

  • *What’s It Going to Cost You? : Predicting Effort vs. Informativeness for Multi-Label Image Annotations.  S. Vijayanarasimhan and K. Grauman.  CVPR 2009 [pdf] [data] [code]


  • iCoseg: Interactive Co-segmentation with Intelligent Scribble Guidance, D. Batra, A. Kowdle, D. Parikh, J. Luo and T. Chen. CVPR 2010.  [pdf] [web]

  • Labeling Images with a Computer Game. L. von Ahn and L. Dabbish. CHI, 2004.

  • Who's Vote Should Count More: Optimal Integration fo Labels from Labelers of Unknown Expertise.  J. Whitehill et al.  NIPS 2009.  [pdf]
  • Utility Data Annotation with Amazon Mechanical Turk. A. Sorokin and D. Forsyth. Wkshp on Internet Vision, 2008.

  • Far-Sighted Active Learning on a Budget for Image and Video Recognition.  S. Vijayanarasimhan, P. Jain, and K. Grauman.  CVPR 2010.  [pdf]  [code]

  • Multiclass Recognition and Part Localization with Humans in the Loop.  C. Wah et al. ICCV 2011. [pdf]

  • Multi-Level Active Prediction of Useful Image Annotations for Recognition.  S. Vijayanarasimhan and K. Grauman.  NIPS 2008. [pdf] 

  • Active Learning from Crowds.  Y. Yan, R. Rosales, G. Fung, J. Dy.  ICML 2011.  [pdf]

  • Proactive Learning: Cost-Sensitive Active Learning with Multiple Imperfect Oracles.  P. Donmez and J. Carbonell.  CIKM 2008.  [pdf]
  • Inactive Learning?  Difficulties Employing Active Learning in Practice.  J. Attenberg and F. Provost.  SIGKDD 2011. [pdf]

  • Annotator Rationales for Visual Recognition.  J. Donahue and K. Grauman.  ICCV 2011. [pdf]

  • Interactively Building a Discriminative Vocabulary of Nameable Attributes.  D. Parikh and K. Grauman.  CVPR 2011.  [pdf] [web]

  • Actively Selecting Annotations Among Objects and Attributes.  A. Kovashka, S. Vijayanarasimhan, and K. Grauman.  ICCV 2011  [pdf]

  • Supervised Learning from Multiple Experts: Whom to Trust When Everyone Lies a Bit.  V. Raykar et al.  ICML 2009.  [pdf]
  • Multi-class Active Learning for Image Classification.  A. J. Joshi, F. Porikli, and N. Papanikolopoulos.  CVPR 2009.  [pdf]

  • GrabCut -Interactive Foreground Extraction using Iterated Graph Cuts, by C. Rother, V. Kolmogorov, A. Blake, SIGGRAPH 2004.  [pdf]  [project page]

  • Active Learning for Piecewise Planar 3D Reconstruction.  A. Kowdle, Y.-J. Chang, A. Gallagher and T. Chen. CVPR 2011 [pdf] [web]

  • Amazon Mechanical Turk
  • Using Mechanical Turk with LabelMe
Papers: Brian, Harsh
Expts: Yunsik

Proposal extended outline due Friday Oct 21, 5 PM
IV. Activity in images and video
Oct 26
Pictures of people

Finding people and their poses, automatic face tagging

pose

  • *Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations, L. Bourdev and J. Malik.  ICCV 2009  [pdf[code]

  • *Understanding Images of Groups of People, A. Gallagher and T. Chen, CVPR 2009.  [pdf]  [web] [data]

  • *Real-Time Human Pose Recognition in Parts from a Single Depth Image.  J. Shotton et al.  CVPR 2011. [pdf] [video]

  • *"'Who are you?' - Learning Person Specific Classifiers from Video, J. Sivic, M. Everingham, and A. Zisserman, CVPR 2009.  [pdf] [data] [KLT tracking code]


  • Contextual Identity Recognition in Personal Photo Albums. D. Anguelov, K.-C. Lee, S. Burak, Gokturk, and B. Sumengen. CVPR 2007.  [pdf]

  • Fast Pose Estimation with Parameter Sensitive Hashing.  G. Shakhnarovich, P. Viola, T. Darrell, ICCV 2003.[pdf]

  • Finding and Tracking People From the Bottom Up.  D. Ramanan, D. A. Forsyth.  CVPR 2003.  [pdf]

  • Where’s Waldo: Matching People in Images of Crowds.  R. Garg, D. Ramanan, S. Seitz, N. Snavely. CVPR 2011.  [pdf]

  • Autotagging Facebook: Social Network Context Improves Photo Annotation, by  Z. Stone, T. Zickler, and T. Darrell.  CVPR Internet Vision Workshop 2008.   [pdf]

  • Efficient Propagation for Face Annotation in Family Albums. L. Zhang, Y. Hu, M. Li, and H. Zhang.  MM 2004.  [pdf]

  • Progressive Search Space Reduction for Human Pose Estimation.  Ferrari, V., Marin-Jimenez, M. and Zisserman, A.  CVPR 2008.  [pdf] [web] [code]

  • Leveraging Archival Video for Building Face Datasets, by D. Ramanan, S. Baker, and S. Kakade.  ICCV 2007.  [pdf]
  • Names and Faces in the News, by T. Berg, A. Berg, J. Edwards, M. Maire, R. White, Y. Teh, E. Learned-Miller and D. Forsyth, CVPR 2004.  [pdf]  [web]

  • Face Discovery with Social Context.  Y. J. Lee and K. Grauman.  BMVC 2011.  [pdf]

  • “Hello! My name is... Buffy” – Automatic Naming of Characters in TV Video, by M. Everingham, J. Sivic and A. Zisserman, BMVC 2006.  [pdf]  [web]  [data]

  • Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities. Yao, B., Fei-Fei, L.  CVPR 2010.

  • A Face Annotation Framework with Partial Clustering and Interactive Labeling.  R. X. Y. Tian,W. Liu, F.Wen, and X. Tang.  CVPR 2007.  [pdf] [web]

  • From 3D Scene Geometry to Human Workspace.  A. Gupta et al.  CVPR 2011.  [pdf] [web]

  • Pictorial Structures Revisited: People Detection and Articulated Pose Estimation.  M. Andriluka et al. CVPR 2009.  [pdf]  [code]

Papers: Sunil, Larry
Expts: Nishant, Jung


Nov 2
Activity recognition

Recognizing and localizing human actions in video

actions
  • *Actions in Context, M. Marszalek, I. Laptev, C. Schmid.  CVPR 2009.  [pdf] [web] [data]

  • *A Hough Transform-Based Voting Framework for Action Recognition.  A. Yao, J. Gall, L. Van Gool.  CVPR 2010.  [pdf[code/data]

  • *Beyond Actions: Discriminative Models for Contextual Group Activities.  T. Lian, Y. Wang, W. Yang, and G. Mori.  NIPS 2010.  [pdf] [data]


  • Objects in Action: An Approach for Combining Action Understanding and Object Perception.   A. Gupta and L. Davis.  CVPR, 2007.  [pdf]  [data]

  • Learning Realistic Human Actions from Movies.  I. Laptev, M. Marszałek, C. Schmid and B. Rozenfeld.  CVPR 2008.  [pdf]  [data]

  • Understanding Egocentric Activities.  A. Fathi, A. Farhadi, J. Rehg.  ICCV 2011. [pdf]

  • Exploiting Human Actions and Object Context for Recognition Tasks.  D. Moore, I. Essa, and M. Hayes.  ICCV 1999.  [pdf]

  • A Scalable Approach to Activity Recognition Based on Object Use. J. Wu, A. Osuntogun, T. Choudhury, M. Philipose, and J. Rehg.  ICCV 2007.  [pdf]

  • Recognizing Actions at a Distance.  A. Efros, G. Mori, J. Malik.  ICCV 2003.  [pdf] [web]

  • Activity Recognition from First Person Sensing.  E. Taralova, F. De la Torre, M. Hebert  CVPR 2009 Workshop on Egocentric Vision  [pdf]

  • Action Recognition from a Distributed Representation of Pose and Appearance, S. Maji, L. Bourdev, J.  Malik, CVPR 2011.  [pdf]  [code]

  • Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition.  A. Kovashka and K. Grauman.  CVPR 2010.  [pdf]

  • Temporal Causality for the Analysis of Visual Events.  K. Prabhakar, S. Oh, P. Wang, G. Abowd, and J. Rehg.  CVPR 2010.  [pdf] [Georgia Tech Computational Behavior Science project]

  • Modeling Activity Global Temporal Dependencies using Time Delayed Probabilistic Graphical Model.  Loy, Xiang & Gong ICCV 2009.  [pdf]

  • What's Going on?: Discovering Spatio-Temporal Dependencies in Dynamic Scenes.  D. Kuettel et al.  CVPR 2010.  [pdf]

  • Learning Actions From the Web.  N. Ikizler-Cinbis, R. Gokberk Cinbis, S. Sclaroff.  ICCV 2009.  [pdf]

  • Content-based Retrieval of Functional Objects in Video Using Scene Context.  S. Oh, A. Hoogs, M. Turek, and R. Collins.  ECCV 2010.  [pdf]
Papers: Qiming, Yunsik
Expts: Lu Xia


V. Dealing with lots of data/categories
Nov 9
Scaling with a large number of categories

Sharing features between classes, transfer, taxonomy, learning from few examples, exploiting class relationships

shared
  • *Sharing Visual Features for Multiclass and Multiview Object Detection, A. Torralba, K. Murphy, W. Freeman, PAMI 2007.  [pdf]  [code]

  • *What Does Classifying More than 10,000 Image Categories Tell Us? J. Deng, A. Berg, K. Li and L. Fei-Fei.  ECCV 2010.  [pdf]

  • *Discriminative Learning of Relaxed Hierarchy for Large-scale Visual Recognition.  T. Gao and Daphne Koller ICCV 2011.  [pdf] [code]


  • Comparative Object Similarity for Improved Recognition with Few or Zero Examples. G. Wang, D. Forsyth, and D. Hoeim. CVPR 2010. [pdf]

  • Learning and Using Taxonomies for Fast Visual Categorization, G. Griffin and P. Perona, CVPR 2008.  [pdf] [data]

  • Cross-Generalization: Learning Novel Classes from a Single Example by Feature Replacement.  CVPR 2005.  [pdf]

  • 80 Million Tiny Images: A Large Dataset for Non-Parametric Object and Scene Recognition, by A. Torralba, R. Fergus, and W. Freeman.  PAMI 2008.  [pdf] [web]

  • Constructing Category Hierarchies for Visual Recognition, M. Marszalek and C. Schmid.  ECCV 2008.  [pdf]  [web] [Caltech256]

  • Learning Generative Visual Models from Few Training Examples: an Incremental Bayesian Approach Tested on 101 Object Categories. L. Fei-Fei, R. Fergus, and P. Perona. CVPR Workshop on Generative-Model Based Vision. 2004.  [pdf] [Caltech101]

  • Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts. S. Fidler and A. Leonardis.  CVPR 2007  [pdf]

  • Exploiting Object Hierarchy: Combining Models from Different Category Levels, A. Zweig and D. Weinshall, ICCV 2007 [pdf]

  • Incremental Learning of Object Detectors Using a Visual Shape Alphabet.  Opelt, Pinz, and Zisserman, CVPR 2006.  [pdf]

  • Sequential Learning of Reusable Parts for Object Detection.  S. Krempp, D. Geman, and Y. Amit.  2002  [pdf]

  • ImageNet: A Large-Scale Hierarchical Image Database, J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, CVPR 2009 [pdf]  [data]

  • Semantic Label Sharing for Learning with Many Categories.  R. Fergus et al.  ECCV 2010.  [pdf]

  • Learning a Tree of Metrics with Disjoint Visual Features.  S. J. Hwang, K. Grauman, F. Sha.  NIPS 2011. 

Papers: Cho-Jui, Si Si
Expts: Lu Pan


Nov 16
Large-scale search and mining

Scalable retrieval algorithms for massive databases, mining for themes

hash
  • *VisualRank: Applying PageRank to Large-Scale Image Search.  Y. Jing and S. Baluja.  PAMI 2008.  [pdf]

  • *Kernelized Locality Sensitive Hashing for Scalable Image Search, by B. Kulis and K. Grauman, ICCV 2009 [pdf]  [code]

  • *Video Mining with Frequent Itemset Configurations.  T. Quack, V. Ferrari, and L. Van Gool.  CIVR 2006.  [pdf]


  • Learning Binary Projections for Large-Scale Image Search.  K. Grauman and R. Fergus.  Chapter (draft) to appear in Registration, Recognition, and Video Analysis, R. Cipolla, S. Battiato, and G. Farinella, Editors.  [pdf]

  • World-scale Mining of Objects and Events from Community Photo Collections.  T. Quack, B. Leibe, and L. Van Gool.  CIVR 2008.  [pdf

  • Interest Seam Image.  X. Zhang, G. Hua, L. Zhang, H. Shum.  CVPR 2010.  [pdf]

  • Detecting Objects in Large Image Collections and Videos by Efficient Subimage Retrieval, C. Lampert, ICCV 2009.  [pdf]  [code

  • Geometric Min-Hashing: Finding a (Thick) Needle in a Haystack, O. Chum, M. Perdoch, and J. Matas.  CVPR 2009.  [pdf]

  • FaceTracer: A Search Engine for Large Collections of Images with Faces.  N. Kumar, P. Belhumeur, and S. Nayar.  ECCV 2008.  [pdf]

  • Efficiently Searching for Similar Images.  K. Grauman.  Communications of the ACM, 2009.  [CACM link]

  • Fast Image Search for Learned Metrics, P. Jain, B. Kulis, and K. Grauman, CVPR 2008.  [pdf]

  • Small Codes and Large Image Databases for Recognition, A. Torralba, R. Fergus, and Y. Weiss, CVPR 2008.  [pdf]

  • Object Retrieval with Large Vocabularies and Fast Spatial Matching.  J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, CVPR 2007.  [pdf]


Papers: Naga, Jae
Expts: Si Si


Nov 23
Summarization

Video synopsis, discovering repeated objects, visualization

synopsis
  • *Webcam Synopsis: Peeking Around the World, by Y. Pritch, A. Rav-Acha, A. Gutman, and S. Peleg, ICCV 2007.  [pdf] [web]

  • *Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman.  CVPR 2006.  [pdf] [code]

  • *Summarizing Visual Data Using Bi-Directional Similarity.  D. Simakov, Y. Caspi, E. Shechtmann, M. Irani.  CVPR 2008.  [pdf] [video]


  • Fast Unsupervised Ego-Action Learning for First-Person Sports Video.  K. Kitani, T. Okabe, Y. Sato, A. Sugimoto.  CVPR 2011.  [pdf]

  • Scene Summarization for Online Image Collections.  I. Simon, N. Snavely, S. Seitz.  ICCV 2007.  [pdf]  [web]

  • VideoCut: Removing Irrelevant Frames by Discovering the Object of Interest. D. Liu, G. Hua, T. Chen.  ECCV 2010.  [pdf]

  • Video Epitomes. V. Cheung, B. J. Frey, and N. Jojic.  CVPR 2005. [pdf] [web] [code]

  • Making a Long Video Short. A. Rav-Acha, Y. Pritch, and S. Peleg.  CVPR 2006. [pdf]

  • Structural Epitome: A Way to Summarize One's Visual Experience.  N. Jojic, A. Perina, V. Murino.  NIPS 2010.  [pdf] [data]

  • Video Abstraction: A Systematic Review and Classification.  B. Truong and S. Venkatesh.  ACM 2007.  [pdf]

  • Shape Discovery from Unlabeled Image Collections.  Y. J. Lee and K. Grauman.  CVPR 2009.  [pdf]
  • Detecting and Sketching the Common.  S. Bagon, O. Brostovski, M. Galun, M. Irani.  CVPR 2010.  [pdf]
  • Object-Graphs for Context-Aware Category Discovery.  Y. J. Lee and K. Grauman.  CVPR 2010.  [pdf] [code]

  • Unsupervised Object Discovery: A Comparison.  T. Tuytelaars et al.  IJCV 2009.  [pdf]

Papers: Lu Pan
Expts: Sunil, Chris

Final paper drafts due Wed Nov 23
Nov 30
Final project presentations in class


Final papers due Tues Dec 6, 5 PM


Other useful links: