CS 395T: Visual Recognition and Search, Spring 2009

Syllabus and selected papers

 

Overview

 

I.                    Categorizing and matching objects

a.      Sliding window detection

b.      Distances and kernels

c.       Part-based models

d.      Image annotation process

 

II.                  Surrounding cues

a.      Inferring 3d cues from a single image

b.      Scene recognition

c.       Context

 

III.                Data-driven visual learning

a.      Leveraging internet data

b.      Text, language, and imagery

c.       Unsupervised learning and discovery

 

IV.                Searching and browsing visual content

a.      Fast indexing and search

b.      Browsing: query refinement and summarization

c.       Social networks and image tagging

 

 

Tentative schedule

 

Details

 

I. Categorizing and matching objects

 

Object detection via appearance and sliding windows

 

Rapid Object Detection Using a Boosted Cascade of Simple Features, by P. Viola and M. Jones. CVPR 2001.

[pdf]  [Face detection in OpenCV]

 

Histograms of Oriented Gradients for Human Detection, by N.Dalal, B.Triggs.  CVPR 2005

[pdf]  [demo video]  [software]  [PASCAL datasets]

 

Additional code / software:

Pyramid Histogram of Oriented Gradients (PHOG) code from Anna Bosch

 

Features: interest operators, descriptors, regions

 

Object Recognition from Local Scale-Invariant Features, by D. Lowe. ICCV 1999. 

[pdf]  [code]

 

Local Invariant Feature Detectors: A Survey, by T. Tuytelaars and K. Mikolajczyk.  Foundations and Trends in Computer Graphics and Vision, 2008. 

[pdf]  [code]

 

Sampling Strategies for Bag-of-Features Image Classification.  E. Nowak, F. Jurie, and B. Triggs.  ECCV 2006.

[pdf]

 

Groups of Adjacent Contour Segments for Object Detection, by V. Ferrari, L. Fevrier, F. Jurie, and C. Schmid.  PAMI 2007.

[pdf]  [code]

 

Normalized Cuts for Image Segmentation, by J. Shi and J. Malik.  CVPR 1997.

[pdf]

 

Shape Matching and Object Recognition Using Shape Contexts, by S. Belongie, J. Malik, and J. Puzicha.  PAMI April 2002.

[pdf]  [code

 

Additional code / software:

Oxford Interest Point Software Webpage

UNC GPU-SIFT implementation

Herbert Bay's SURF features

Greg Mori’s superpixels code

John Lee’s libpmk feature extraction code

Pyramid Histogram of Oriented Gradients (PHOG) code from Anna Bosch

Andrea Vedaldi’s SIFT code

Software from LEAR team at INRIA, including interest point detectors, shape features

Ivan Laptev’s software for space-time interest points and histograms of oriented gradients (HOG) and histograms of optical flow (HOF)
Berkeley Group boundary detection code from David Martin
Graph-based segmentation code from Pedro Felzenszwalb

Distances and kernels, bags-of-words representations

The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features, by K. Grauman and T. Darrell.  ICCV 2005.

[pdf]  [web]  [code]  [Caltech101]

 

Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification, by A. Frome, Y. Singer, F. Sha, J. Malik.  ICCV 2007.

[pdf]

 

Video Google: A Text Retrieval Approach to Object Matching in Videos, by J. Sivic and A. Zisserman, ICCV 2003.

[pdf]  [demo]

 

Proximity Distribution Kernels for Geometric Context in Category Recognition, by H. Ling and S. Soatto.  CVPR 2007.

[pdf]  [PASCAL datasets]  [Graz dataset]

 

Additional code / software:

LIBSVM: library for Support Vector Machines and tool for precomputed kernel matrices

Caltech-101 kernels and results from Anna Bosch and Andrew Zisserman

Part-based models

Object Class Recognition by Unsupervised Scale Invariant Learning, by R. Fergus, P. Perona, and A. Zisserman.  CVPR 2003.

[pdf]  [datasets]

 

Combined Object Categorization and Segmentation with an Implicit Shape Model, by B. Leibe, A. Leonardis, and B. Schiele.   ECCV Workshop on Statistical Learning in Computer Vision, 2004.

[pdf]  [code]  [video1]  [video2]

 

A Discriminatively Trained, Multiscale, Deformable Part Model, by P. Felzenszwalb,  D.  McAllester and D. Ramanan.   CVPR 2008.

[pdf]  [code]

 

Additional code / software:

Simple parts-and-structure detector (by Fergus/FeiFei/Torralba) http://people.csail.mit.edu/fergus/iccv2005/partsstructure.html

 

 

Image annotation process


LabelMe: a Database and Web-based Tool for Image Annotation.  B. Russell, A. Torralba, K. Murphy, and W. Freeman, IJCV 2008.

[pdf]  [web]

 

Peekaboom: A Game for Locating Objects in Images, by L. von Ahn, R. Liu and M. Blum, CHI 2006.

[pdf]

 

GrabCut -Interactive Foreground Extraction using Iterated Graph Cuts, by C. Rother, V. Kolmogorov, A. Blake, SIGGRAPH 2004.

[pdf]  [demo video]

 

Multi-Level Active Prediction of Useful Image Annotations for Recognition, by S. Vijayanarasimhan and K. Grauman, NIPS 2008.

[pdf]

 

Additional code / software / demos:

ESP Game and other games, Luis von Ahn et al.

CAPTCHA: Telling Humans and Computers Apart Automatically

II. Surrounding cues

Inferring 3d cues from a single image

Geometric Context from a Single Image, by D. Hoiem, A. Efros, and M. Hebert, ICCV 2005.

[pdf]  [web]  [code]

Depth Estimation using Monocular and Stereo Cues, by A. Saxena, J. Schulte, and A. Ng.  IJCAI 2007.

[pdf]  [web

 

Inferring Spatial Layout from A Single Image via Depth-Ordered Grouping, by S. Yu and H. Zhang and J. Malik, Workshop on Perceptual Organization in Computer Vision, 2008.  [pdf]  [slides]  [data]

 

Additional code / software:

Try Labelme’s 3d popup feature

Greg Mori’s superpixels code

Pedro Felzenszwalb’s segmentation code

Hoiem et al. Automatic photo pop-up

Try Stanford Make3D demo

Scene recognition

Modeling the Shape of the Scene: a Holistic Representation of the Spatial Envelope, by A. Oliva and A. Torralba, IJCV 2001.

[pdf]  [code

 

A Bayesian Hierarchical Model for Learning Natural Scene Categories, by L. Fei-Fei and P. Perona. CVPR 2005.

[pdf]

 

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, by S. Lazebnik, C. Schmid, and J. Ponce, CVPR 2006.

[pdf]  [slides]  [dataset]  [libpmk_spatial] [Matlab code]

 

Additional code / software / data:

Scene Understanding Symposium

100 natural scenes from Fei-Fei et al.

13 natural scene categories dataset

David Blei’s topic modeling code

Context

Learning Spatial Context: Using Stuff to Find Things, by G. Heitz and D. Koller, ECCV 2008.

[pdf]

 

Contextual Priming for Object Detection, by A. Torralba.  IJCV, 2003. 

[pdf]  [web1] [web2]


Object Categorization using Co-Occurrence, Location and Appearance, by C. Galleguillos, A. Rabinovich and S. Belongie, CVPR 2008. 

[pdf]

 

Putting Objects in Perspective, by D. Hoiem, A. Efros, and M. Hebert, CVPR 2006. 

[pdf] [web]

 

Additional code / software / data:

Survey on context in recognition by Galleguillos et al.

Labelme dataset

III. Data-driven visual learning

Leveraging internet data

IM2GPS: Estimating Geographic Information from a Single Image, by J. Hays and A. Efros.  CVPR 2008.

[pdf]  [web]

 

80 Million Tiny Images: a Large Dataset for Non-Parametric Object and Scene Recognition. by A. Torralba, R. Fergus, and W. Freeman, PAMI 2008.

[pdf] [web]  [Wordnet]

 

Scene Segmentation Using the Wisdom of Crowds, by I. Simon and S. Seitz.  ECCV 2008. 

[pdf]

 

Harvesting Image Databases from the Web, by F. Schroff, A. Criminisi, and A. Zisserman,  ICCV 2007. 

[pdf]

 

World-scale Mining of Objects and Events from Community Photo Collections, by T. Quack, B. Leibe, and L. Van Gool, CIVR 2008. 

[pdf]

 

Additional code / data / papers / demos:

Tamara Berg’s Animals on the Web data

Florian Schroff’s page on Harvesting Image Databases from the web

Rob Fergus’s dataset for Learning Object Categories from Google’s Image Search

Code for finding and downloading images on Flickr, by James Hays

Creating and Exploring a Large Photorealistic Virtual Space, Sivic et al.

Semantic Robot Vision Challenge

Text, language, and imagery

“Hello! My name is... Buffy” – Automatic Naming of Characters in TV Video, by M. Everingham, J. Sivic and A. Zisserman, BMVC 2006.

[pdf]  [web]  [data]

 

Names and Faces in the News, by T. Berg, A. Berg, J. Edwards, M. Maire, R. White, Y. Teh, E. Learned-Miller and D. Forsyth, CVPR 2004. 

[pdf]  [web]

 

Movie/Script: Alignment and Parsing of Video and Text Transcription, by T. Cour, C. Jordan, E. Miltsakaki, and B. Taskar, ECCV 2008.

[pdf]  [videos]

 

Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers, A. Gupta and L. Davis, ECCV 2008. 

[pdf]

 

Additional code / data:

Wordnet

Subrip for subtitle extraction

Reuters captioned photos

Sonal Gupta’s dataset of srports videos with commentary

Face data from Buffy episode, from Oxford Visual Geometry Group

Unsupervised learning and discovery

Discovering Objects and Their Location in Images, by J. Sivic, B. Russell, A. Efros, A. Zisserman, and W. Freeman, ICCV 2005. 

[pdf] [web]

           

Unsupervised Discovery of Action Classes, by Y. Wang, H. Jiang, M. Drew, Z-N. Li and G. Mori, CVPR 2006.

[pdf]  [web]

 

Detecting Irregularities in Images and in Video, by O. Boiman, M. Irani, ICCV 2005.

[pdf]  [web]

IV. Searching and browsing visual content

Fast indexing and search

Scalable Recognition with a Vocabulary Tree, by D. Nister and H. Stewenius, CVPR 2006. 

[pdf]

 

Fast Image Search for Learned Metrics.  P. Jain, B. Kulis, and K. Grauman, CVPR 2008. 

[pdf]  [slides]

 

Efficient Near-Duplicate Detection and Sub-Image Retrieval.  Y. Ke, R. Sukthankar, and L. Huston.  Multimedia 2004.  [pdf]

 

Additional code / data / references:

Oxford project on object retrieval with vocabulary trees

LSH homepage

LSH Matlab code by Greg Shakhnarovich

Nearest neighbor datasets from Vassilis Athitsos

Electronic copy of the book Nearest Neighbor Methods in Learning and Vision: Theory and Practice (UT EID required)

Searching in Metric Spaces, a survey by Chavez et al.  ACM Computing Surveys, Vol. 33, No. 3, September 2001, pp. 273–321

Small Codes and Large Image Databases for Recognition, by Torralba, A. , Fergus, R. and Weiss, Y.  CVPR 2008. 

[pdf] [slides]

Object Retrieval with Large Vocabularies and Fast Spatial Matching.  J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, CVPR 2007.  [pdf]

 

 

 

Browsing: query refinement and summarization

Nonchronological Video Synopsis and Indexing, by Y. Pritch, A. Rav-Acha, and S. Peleg, TPAMI 2008. 

[pdf] [web]

 

CuZero: Embracing the Frontier of Interactive Visual Search for Informed Users, by E. Zavesky and S-F. Chang, MIR 2008. 

[pdf]

 

Photo Tourism: Exploring Photo Collections in 3D, by N. Snavely, S. Seitz, and R. Szeliski, SIGGRAPH 2006. 

[pdf] [web]

 

Graph-Cut Transducers for Relevance Feedback in Content Based Image Retrieval, by H. Sahbi, J-Y. Audibert, R. Keriven, ICCV 2007. [pdf]

 

Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs, by X. Li, C. Wu, C. Zach, S. Lazebnik, and J. Frahm, ECCV 2008.  [pdf]  [web]

 

 

Additional references / demos / data:

UW Community Photo Collections Webpage

Survey by Xiang Zhou and Thomas Huang on relevance feedback for CBIR, 2001

Baeza-Yates & Ribeiro-Neto Chapter 5 on query operations

Social networks and image tagging

Autotagging Facebook: Social Network Context Improves Photo Annotation, by  Z. Stone, T. Zickler, and T. Darrell.  Internet Vision Workshop 2007. 

[pdf]

 

Learning Tag Relevance by Neighbor Voting for Social Image Retrieval, by X. Li, C. Snoek, and M. Worring.  MIR 2008. 

[pdf]

 

Why We Tag: Motivations for Annotation in Mobile and Online Media, by M. Ames and M. Naaman, CHI 2007. 

[pdf]

 

Additional software / demos / references:

Flickr.com

Tagggit.com