Visual Object Recognition and Image Search



Course overview        Useful links        Detailed schedule       


Meets: Monday June 27 through Friday July 1,
9:00-11:00 and 12:00-14:00 in room A216
 
Instructor: Kristen Grauman, University of Texas at Austin 

Course site: http://www.cs.utexas.edu/~grauman/courses/trento2011/


Announcements:

See the schedule for current reading assignments.  Papers with a star will be discussed in class on the day they are listed.

Slides from lecture are posted here.

Course overview:


This is a graduate course in computer vision.   We will survey and discuss current vision papers relating to object recognition, auto-annotation of images, scene understanding, and large-scale visual search.  Lectures will cover some fundamental algorithms and basics in feature extraction, as well as highlight recent advances in the literature.  Students will read technical papers prior to each class session to allow discussion during class.

Requirements: Students will be responsible for writing paper reviews, participating in discussions, completing one programming assignment, completing in-class exercises, and taking a final exam.

Prerequisites:  Basic knowledge of probability, linear algebra, machine learning; data structures, algorithms; programming experience.  Background in image processing or vision will be useful but is not assumed.


Grades: Grades will be determined by


Reading:  Much of the reading will come from research papers, plus some background from Rick Szeliski's textbook, Computer Vision: Algorithms and Applications.  A draft of the textbook is freely available here



Syllabus:


Date
Topics
Papers and links (code, data, etc):  * = required reading.  Additional papers are provided for reference.
Items due
Monday
June 27
Low-level features

Filtering, edges, local feature detection and description

Slides

filter


  • *Szeliski book: Sec 3.2 Linear filtering, 4.1 Points and patches, 4.2 Edges
  • *Local Invariant Feature Detectors: A Survey, Tuytelaars and Mikolajczyk.  Foundations and Trends in Computer Graphics and Vision, 2008. [pdf]  [Oxford code] [esp pp. 178-188, 216-220, 254-255]
  • Object Recognition from Local Scale-Invariant Features, Lowe, ICCV 1999.  [pdf]  [code] [other implementations of SIFT] [IJCV]
  • Speeded-Up Robust Features (SURF).  H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool.  2008 [pdf] [code]
  • Robust Wide Baseline Stereo from Maximally Stable Extremal Regions, J. Matas, O. Chum, U. Martin, and T. Pajdla, BMVC 2002.  [pdf]
  • A Performance Evaluation of Local Descriptors. K. Mikolajczyk and C. Schmid.  CVPR 2003 [pdf]
  • Matching Local Self-Similarities Across Images and Videos, Shechtman and Irani, CVPR 2007.  [pdf]
  • Oxford group interest point software
  • Andrea Vedaldi's VL Feats code, including SIFT, MSER, hierarchical k-means...
  • INRIA LEAR team's software, including interest points, shape features

Tuesday
June 28
Mid-level representations

Segmentation, grouping
, and fitting

Slides

grouping
  • *Szeliski book: Sec 5.3-5.5 Segmentation, 4.3.2 Hough transform
  • *From Contours to Regions: An Empirical Evaluation.  P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik.  CVPR 2009.  [pdf] [code]
  • *Constrained Parametric Min-Cuts for Automatic Object Segmentation.  J. Carreira and C. Sminchisescu.  CVPR 2010.  [pdf]  [code]
  • *Geometric Context from a Single Image.  D. Hoiem, A. Efros, and M. Hebert.  ICCV 2005.  [pdf]  [code]
  • *GrabCut -Interactive Foreground Extraction using Iterated Graph Cuts, by C. Rother, V. Kolmogorov, A. Blake, SIGGRAPH 2004.  [pdf]  [project page]
  • Ballard and Brown Hough Transform excerpt [pdf] Hough Transform demo
  • Combining Top-down and Bottom-up Segmentation. E. Borenstein, E. Sharon, and S. Ullman.  CVPR  workshop 2004.  [pdf]  [data]
  • Boundary Preserving Dense Local Regions.  J. Kim and K. Grauman.  CVPR 2011.  [pdf]  [code]
  • Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman.  CVPR 2006.  [pdf] [code]
  • Extracting Subimages of an Unknown Category from a Set of Images, S. Todorovic and N. Ahuja, CVPR 2006.  [pdf]
  • Using Contours to Detect and Localize Junctions in Natural Images.  M. Maire, P. Arbelaez, C. Fowlkes, and J. Malik.  CVPR 2008.  [pdf] [code]
  • Learning to Detect Natural Image Boundaries using Local Brightness, Color, and Texture Cues.  D. Martin, C. Fowlkes, and J. Malik.  PAMI 2004.  [pdf]
  • Co-segmentation of Image Pairs by Histogram Matching --Incorporating a Global Constraint into MRFs, C. Rother, V. Kolmogorov, T. Minka, and A. Blake.  CVPR 2006.  [pdf]
  • Collect-Cut: Segmentation with Top-Down Cues Discovered in Multi-Object Images.  Y. J. Lee and K. Grauman. CVPR 2010.  [pdf]
  • Category-Independent Object Proposals.  I. Endres and D. Hoiem.  ECCV 2010.  [pdf]  [code]
  • What is an Object?  B. Alexe, T. Deselaers, and V. Ferrari.  CVPR 2010.  [pdf] [code]
  • Robust Higher Order Potentials for Enforcing Label Consistency, P. Kohli, L. Ladicky, and P. Torr. CVPR 2008.  [pdf]
  • Normalized Cuts and Image Segmentation, J. Shi and J. Malik.  PAMI 2000.  [pdf]  [code]
  • Greg Mori's superpixel code
  • Berkeley Segmentation Dataset and code
  • Pedro Felzenszwalb's graph-based segmentation code
  • Mean-shift: a Robust Approach Towards Feature Space Analysis [pdf]  [code, Matlab interface by Shai Bagon]
Reading
Wednesday
June 29
Recognition and retrieval of specific objects

Matching specific instances of objects

instances


  • *Object Recognition from Local Scale-Invariant Features, Lowe, ICCV 1999.  [pdf]  [code] [other implementations of SIFT] [IJCV]
  • *Video Google: A Text Retrieval Approach to Object Matching in Videos, Sivic and Zisserman, ICCV 2003.  [pdf]  [demo]
  • *Object Retrieval with Large Vocabularies and Fast Spatial Matching. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman.  CVPR 2007.  [pdf]
  • *Bundling Features for Large Scale Partial-Duplicate Web Image Search.  Z. Wu, Q. Ke, M. Isard, and J. Sun.  CVPR 2009.  [pdf]
  • *World-scale Mining of Objects and Events from Community Photo Collections.  T. Quack, B. Leibe, and L. Van Gool.  CIVR 2008.  [pdf] [project page]
  • Mapping the World's Photos.  D. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg.  WWW 2009.  [pdf]
  • Scalable Recognition with a Vocabulary Tree, D. Nister and H. Stewenius, CVPR 2006. [pdf]  [VLfeats code]
  • Clues from the Beaten Path: Location Estimation with Bursty Sequences of Tourist Photos.  C.-Y. Chen and K. Grauman. CVPR 2011.  [pdf]  [project page]
  • Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval.  O. Chum et al. CVPR 2007.  [pdf]
  • Spatial Coding for Large Scale Partial-Duplicate Web Image Search.  W. Zhou et al.  MM 2010.  [pdf]
  • Image Retrieval with Geometry-Preserving Visual Phrases.  Y. Zhang, Z. Jia, and T. Chen.  CVPR 2011.  [pdf]
  • Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs.  X. Li C. Wu, C. Zach, S. Lazebnik, J. Frahm.  ECCV 2008.  [pdf]
  • Attribute and Simile Classifiers for Face Verification, N. Kumar, A. Berg, P. Belhumeur, S. Nayar.  ICCV 2009.  [pdf] [web] [data]
  • Names and Faces in the News, by T. Berg, A. Berg, J. Edwards, M. Maire, R. White, Y. Teh, E. Learned-Miller and D. Forsyth, CVPR 2004.  [pdf]  [web]
Reading
Thursday
June 30
Recognition and detection of object categories

Learning models for generic object categories

categories
  • *Rapid Object Detection Using a Boosted Cascade of Simple Features, Viola and Jones, CVPR 2001.  [pdf]  [code]
  • *A Discriminatively Trained, Multiscale, Deformable Part Model, by P. Felzenszwalb,  D. McAllester and D. Ramanan.   CVPR 2008.  [pdf]  [code]
  • *What Does Classifying More than 10,000 Image Categories Tell Us? J. Deng, A. Berg, K. Li and L. Fei-Fei.  ECCV 2010.  [pdf]
  • *TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation.  J. Shotton, J. Winn, C. Rother, A. Criminisi.  ECCV 2006.  [pdf] [web] [data]
  • *Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, C. Lampert, H. Nickisch, and S. Harmeling, CVPR 2009  [pdf] [web] [data]
  • Combined Object Categorization and Segmentation with an Implicit Shape Model, by B. Leibe, A. Leonardis, and B. Schiele.   ECCV Workshop on Statistical Learning in Computer Vision, 2004.   [pdf]  [code]
  • Sharing Visual Features for Multiclass and Multiview Object Detection, A. Torralba, K. Murphy, W. Freeman, PAMI 2007.  [pdf]  [code]
  • Histograms of Oriented Gradients for Human Detection, Dalal and Triggs, CVPR 2005.  [pdf]  [video] [code] [PASCAL datasets] 
  • The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features, Grauman and Darrell.  ICCV 2005.  [pdf]  [web]  [code]
  • Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, Lazebnik, Schmid, and Ponce, CVPR 2006. [pdf]  [15 scenes dataset]  [libpmk] [Matlab]
  • Constructing Category Hierarchies for Visual Recognition, M. Marszalek and C. Schmid.  ECCV 2008.  [pdf]  [web] [Caltech256]
  • Learning and Using Taxonomies for Fast Visual Categorization, G. Griffin and P. Perona, CVPR 2008.  [pdf]
  • Beyond Sliding Windows: Object Localization by Efficient Subwindow Search, C. Lampert, M. Blaschko, T. Hofmann.  CVPR 2008.  [pdf]
  • Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds.  S. Vijayanarasimhan and K. Grauman.  CVPR 2011.  [pdf]
  • Locality-Constrained Linear Coding for Image Classification.  J. Wang, J. Yang, K. Yu,  and T. Huang  CVPR 2010. [pdf] [code]
  • im2gps: Estimating Geographic Information From and Single Image.  Hays and Efros.  CVPR 2009.  [pdf] [project page, data]
  • Closing the Loop in Scene Interpretation.  D. Hoiem, A. Efros, and M. Hebert.  CVPR 2008.  [pdf]
  • Modeling the Shape of the Scene: a Holistic Representation of the Spatial Envelope, Oliva and Torralba, IJCV 2001.  [pdf]  [Gist code
  • Discriminative Models for Multi-Class Object Layout, C. Desai, D. Ramanan, C. Fowlkes. ICCV 2009.  [pdf]  [slides]  [SVM struct code] [data]
  • Efficient Region Search for Object Detection.  S. Vijayanarasimhan and K. Grauman. CVPR 2011.  [pdf
  • Context Based Object Categorization: A Critical SurveyC. Galleguillos and S. Belongie.  [pdf]
  • Efficient Matching of Pictorial Structures. P. Felzenszwalb and D. Huttenlocher. CVPR 2000.  [pdf] [related code]
  • Sampling Strategies for Bag-of-Features Image Classification.  E. Nowak, F. Jurie, and B. Triggs.  ECCV 2006. [pdf]
  • LIBPMK feature extraction code, includes dense sampling
  • LIBSVM library for support vector machines
Reading
Friday
July 1
Visual search and mining
 
Large-scale search algorithms, discovery

search
  • *VisualRank: Applying PageRank to Large-Scale Image Search.  Y. Jing and S. Baluja.  PAMI 2008.  [pdf]
  • *FaceTracer: A Search Engine for Large Collections of Images with Faces.  N. Kumar, P. Belhumeur, and S. Nayar.  ECCV 2008.  [pdf]
  • *Geometric Min-Hashing: Finding a (Thick) Needle in a Haystack, O. Chum, M. Perdoch, and J. Matas.  CVPR 2009.  [pdf]
  • *80 Million Tiny Images: A Large Dataset for Non-Parametric Object and Scene Recognition, by A. Torralba, R. Fergus, and W. Freeman.  PAMI 2008.  [pdf] [web]
  • Kernelized Locality Sensitive Hashing for Scalable Image Search, by B. Kulis and K. Grauman, ICCV 2009 [pdf]  [code]
  • Fast Image Search for Learned Metrics, P. Jain, B. Kulis, and K. Grauman, CVPR 2008.  [pdf]
  • Small Codes and Large Image Databases for Recognition, A. Torralba, R. Fergus, and Y. Weiss, CVPR 2008.  [pdf]
  • Attributes-Based People Search in Surveillance Environments.  D. Vaquero, R. Feris, D. Tran, L. Brown, A. Hampapur, and M. Turk.  WACV 2009.  [pdf] [project page]
  • Efficiently Searching for Similar Images.  K. Grauman.  Communications of the ACM, 2009.  [CACM link]
  • Video Mining with Frequent Itemset Configurations.  T. Quack, V. Ferrari, and L. Van Gool.  CIVR 2006.  [pdf]
  • LSH homepage
Reading

Friday
July 15



Coding assignment


Other useful links: