Visual Object Recognition and Image Search

Course overview Useful links Detailed schedule

Meets: Monday June 27 through Friday July 1, 9:00-11:00 and 12:00-14:00 in room A216

Instructor: Kristen Grauman, University of Texas at Austin

Course site: http://www.cs.utexas.edu/~grauman/courses/trento2011/

Announcements:

See the schedule for current reading assignments. Papers with a star will be discussed in class on the day they are listed.

Slides from lecture are posted here.

Course overview:

This is a graduate course in computer vision. We will survey and discuss current vision papers relating to object recognition, auto-annotation of images, scene understanding, and large-scale visual search. Lectures will cover some fundamental algorithms and basics in feature extraction, as well as highlight recent advances in the literature. Students will read technical papers prior to each class session to allow discussion during class.

Requirements: Students will be responsible for writing paper reviews, participating in discussions, completing one programming assignment, completing in-class exercises, and taking a final exam.

Prerequisites: Basic knowledge of probability, linear algebra, machine learning; data structures, algorithms; programming experience. Background in image processing or vision will be useful but is not assumed.

Grades: Grades will be determined by

final exam given Friday July 1 in class (35%)
participation and exercises in class (15%)
paper reviews due Monday through Thursday evenings (25%)
one coding assignment (25%), due Friday July 15.

Reading: Much of the reading will come from research papers, plus some background from Rick Szeliski's textbook, Computer Vision: Algorithms and Applications. A draft of the textbook is freely available here.

Syllabus:

Date	Topics	Papers and links (code, data, etc): * = required reading. Additional papers are provided for reference.	Items due
Monday June 27	Low-level features Filtering, edges, local feature detection and description Slides	Szeliski book: Sec 3.2 Linear filtering, 4.1 Points and patches, 4.2 Edges Local Invariant Feature Detectors: A Survey, Tuytelaars and Mikolajczyk. Foundations and Trends in Computer Graphics and Vision, 2008. [pdf] [Oxford code] [esp pp. 178-188, 216-220, 254-255] Object Recognition from Local Scale-Invariant Features, Lowe, ICCV 1999. [pdf] [code] [other implementations of SIFT] [IJCV] Speeded-Up Robust Features (SURF). H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. 2008 [pdf] [code] Robust Wide Baseline Stereo from Maximally Stable Extremal Regions, J. Matas, O. Chum, U. Martin, and T. Pajdla, BMVC 2002. [pdf] A Performance Evaluation of Local Descriptors. K. Mikolajczyk and C. Schmid. CVPR 2003 [pdf] Matching Local Self-Similarities Across Images and Videos, Shechtman and Irani, CVPR 2007. [pdf] Oxford group interest point software Andrea Vedaldi's VL Feats code, including SIFT, MSER, hierarchical k-means... INRIA LEAR team's software, including interest points, shape features
Tuesday June 28	Mid-level representations Segmentation, grouping, and fitting Slides	Szeliski book: Sec 5.3-5.5 Segmentation, 4.3.2 Hough transform From Contours to Regions: An Empirical Evaluation. P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. CVPR 2009. [pdf] [code] Constrained Parametric Min-Cuts for Automatic Object Segmentation. J. Carreira and C. Sminchisescu. CVPR 2010. [pdf] [code] Geometric Context from a Single Image. D. Hoiem, A. Efros, and M. Hebert. ICCV 2005. [pdf] [code] *GrabCut -Interactive Foreground Extraction using Iterated Graph Cuts, by C. Rother, V. Kolmogorov, A. Blake, SIGGRAPH 2004. [pdf] [project page] Ballard and Brown Hough Transform excerpt [pdf] Hough Transform demo Combining Top-down and Bottom-up Segmentation. E. Borenstein, E. Sharon, and S. Ullman. CVPR workshop 2004. [pdf] [data] Boundary Preserving Dense Local Regions. J. Kim and K. Grauman. CVPR 2011. [pdf] [code] Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman. CVPR 2006. [pdf] [code] Extracting Subimages of an Unknown Category from a Set of Images, S. Todorovic and N. Ahuja, CVPR 2006. [pdf] Using Contours to Detect and Localize Junctions in Natural Images. M. Maire, P. Arbelaez, C. Fowlkes, and J. Malik. CVPR 2008. [pdf] [code] Learning to Detect Natural Image Boundaries using Local Brightness, Color, and Texture Cues. D. Martin, C. Fowlkes, and J. Malik. PAMI 2004. [pdf] Co-segmentation of Image Pairs by Histogram Matching --Incorporating a Global Constraint into MRFs, C. Rother, V. Kolmogorov, T. Minka, and A. Blake. CVPR 2006. [pdf] Collect-Cut: Segmentation with Top-Down Cues Discovered in Multi-Object Images. Y. J. Lee and K. Grauman. CVPR 2010. [pdf] Category-Independent Object Proposals. I. Endres and D. Hoiem. ECCV 2010. [pdf] [code] What is an Object? B. Alexe, T. Deselaers, and V. Ferrari. CVPR 2010. [pdf] [code] Robust Higher Order Potentials for Enforcing Label Consistency, P. Kohli, L. Ladicky, and P. Torr. CVPR 2008. [pdf] Normalized Cuts and Image Segmentation, J. Shi and J. Malik. PAMI 2000. [pdf] [code] Greg Mori's superpixel code Berkeley Segmentation Dataset and code Pedro Felzenszwalb's graph-based segmentation code Mean-shift: a Robust Approach Towards Feature Space Analysis [pdf] [code, Matlab interface by Shai Bagon]	Reading
Wednesday June 29	Recognition and retrieval of specific objects Matching specific instances of objects	Object Recognition from Local Scale-Invariant Features, Lowe, ICCV 1999. [pdf] [code] [other implementations of SIFT] [IJCV] Video Google: A Text Retrieval Approach to Object Matching in Videos, Sivic and Zisserman, ICCV 2003. [pdf] [demo] Object Retrieval with Large Vocabularies and Fast Spatial Matching. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. CVPR 2007. [pdf] Bundling Features for Large Scale Partial-Duplicate Web Image Search. Z. Wu, Q. Ke, M. Isard, and J. Sun. CVPR 2009. [pdf] *World-scale Mining of Objects and Events from Community Photo Collections. T. Quack, B. Leibe, and L. Van Gool. CIVR 2008. [pdf] [project page] Mapping the World's Photos. D. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. WWW 2009. [pdf] Scalable Recognition with a Vocabulary Tree, D. Nister and H. Stewenius, CVPR 2006. [pdf] [VLfeats code] Clues from the Beaten Path: Location Estimation with Bursty Sequences of Tourist Photos. C.-Y. Chen and K. Grauman. CVPR 2011. [pdf] [project page] Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval. O. Chum et al. CVPR 2007. [pdf] Spatial Coding for Large Scale Partial-Duplicate Web Image Search. W. Zhou et al. MM 2010. [pdf] Image Retrieval with Geometry-Preserving Visual Phrases. Y. Zhang, Z. Jia, and T. Chen. CVPR 2011. [pdf] Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs. X. Li C. Wu, C. Zach, S. Lazebnik, J. Frahm. ECCV 2008. [pdf] Attribute and Simile Classifiers for Face Verification, N. Kumar, A. Berg, P. Belhumeur, S. Nayar. ICCV 2009. [pdf] [web] [data] Names and Faces in the News, by T. Berg, A. Berg, J. Edwards, M. Maire, R. White, Y. Teh, E. Learned-Miller and D. Forsyth, CVPR 2004. [pdf] [web] Roweis et al. Astrometry project CVPR 2009 Workshop on Visual Place Categorization Code for downloading Flickr images, by James Hays UW Community Photo Collections homepage FLANN - Fast Library for Approximate Nearest Neighbors. Marius Muja et al.	Reading
Thursday June 30	Recognition and detection of object categories Learning models for generic object categories	Rapid Object Detection Using a Boosted Cascade of Simple Features, Viola and Jones, CVPR 2001. [pdf] [code] A Discriminatively Trained, Multiscale, Deformable Part Model, by P. Felzenszwalb, D. McAllester and D. Ramanan. CVPR 2008. [pdf] [code] What Does Classifying More than 10,000 Image Categories Tell Us? J. Deng, A. Berg, K. Li and L. Fei-Fei. ECCV 2010. [pdf] TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation. J. Shotton, J. Winn, C. Rother, A. Criminisi. ECCV 2006. [pdf] [web] [data] *Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, C. Lampert, H. Nickisch, and S. Harmeling, CVPR 2009 [pdf] [web] [data] Combined Object Categorization and Segmentation with an Implicit Shape Model, by B. Leibe, A. Leonardis, and B. Schiele. ECCV Workshop on Statistical Learning in Computer Vision, 2004. [pdf] [code] Sharing Visual Features for Multiclass and Multiview Object Detection, A. Torralba, K. Murphy, W. Freeman, PAMI 2007. [pdf] [code] Histograms of Oriented Gradients for Human Detection, Dalal and Triggs, CVPR 2005. [pdf] [video] [code] [PASCAL datasets] The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features, Grauman and Darrell. ICCV 2005. [pdf] [web] [code] Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, Lazebnik, Schmid, and Ponce, CVPR 2006. [pdf] [15 scenes dataset] [libpmk] [Matlab] Constructing Category Hierarchies for Visual Recognition, M. Marszalek and C. Schmid. ECCV 2008. [pdf] [web] [Caltech256] Learning and Using Taxonomies for Fast Visual Categorization, G. Griffin and P. Perona, CVPR 2008. [pdf] Beyond Sliding Windows: Object Localization by Efficient Subwindow Search, C. Lampert, M. Blaschko, T. Hofmann. CVPR 2008. [pdf] Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds. S. Vijayanarasimhan and K. Grauman. CVPR 2011. [pdf] Locality-Constrained Linear Coding for Image Classification. J. Wang, J. Yang, K. Yu, and T. Huang CVPR 2010. [pdf] [code] im2gps: Estimating Geographic Information From and Single Image. Hays and Efros. CVPR 2009. [pdf] [project page, data] Closing the Loop in Scene Interpretation. D. Hoiem, A. Efros, and M. Hebert. CVPR 2008. [pdf] Modeling the Shape of the Scene: a Holistic Representation of the Spatial Envelope, Oliva and Torralba, IJCV 2001. [pdf] [Gist code] Discriminative Models for Multi-Class Object Layout, C. Desai, D. Ramanan, C. Fowlkes. ICCV 2009. [pdf] [slides] [SVM struct code] [data] Efficient Region Search for Object Detection. S. Vijayanarasimhan and K. Grauman. CVPR 2011. [pdf] Context Based Object Categorization: A Critical Survey. C. Galleguillos and S. Belongie. [pdf] Efficient Matching of Pictorial Structures. P. Felzenszwalb and D. Huttenlocher. CVPR 2000. [pdf] [related code] Sampling Strategies for Bag-of-Features Image Classification. E. Nowak, F. Jurie, and B. Triggs. ECCV 2006. [pdf] LIBPMK feature extraction code, includes dense sampling LIBSVM library for support vector machines	Reading
Friday July 1	Visual search and mining Large-scale search algorithms, discovery	VisualRank: Applying PageRank to Large-Scale Image Search. Y. Jing and S. Baluja. PAMI 2008. [pdf] FaceTracer: A Search Engine for Large Collections of Images with Faces. N. Kumar, P. Belhumeur, and S. Nayar. ECCV 2008. [pdf] Geometric Min-Hashing: Finding a (Thick) Needle in a Haystack, O. Chum, M. Perdoch, and J. Matas. CVPR 2009. [pdf] 80 Million Tiny Images: A Large Dataset for Non-Parametric Object and Scene Recognition, by A. Torralba, R. Fergus, and W. Freeman. PAMI 2008. [pdf] [web] Kernelized Locality Sensitive Hashing for Scalable Image Search, by B. Kulis and K. Grauman, ICCV 2009 [pdf] [code] Fast Image Search for Learned Metrics, P. Jain, B. Kulis, and K. Grauman, CVPR 2008. [pdf] Small Codes and Large Image Databases for Recognition, A. Torralba, R. Fergus, and Y. Weiss, CVPR 2008. [pdf] Attributes-Based People Search in Surveillance Environments. D. Vaquero, R. Feris, D. Tran, L. Brown, A. Hampapur, and M. Turk. WACV 2009. [pdf] [project page] Efficiently Searching for Similar Images. K. Grauman. Communications of the ACM, 2009. [CACM link] Video Mining with Frequent Itemset Configurations. T. Quack, V. Ferrari, and L. Van Gool. CIVR 2006. [pdf] LSH homepage	Reading
Friday July 15			Coding assignment

Visual Object Recognition and Image Search

Announcements:

Course overview:

Other useful links: