CS 395T: Visual Recognition and Search, Spring 2009

Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification, by A. Frome, Y. Singer, F. Sha, J. Malik. ICCV 2007.

[pdf]

Video Google: A Text Retrieval Approach to Object Matching in Videos, by J. Sivic and A. Zisserman, ICCV 2003.

[pdf] [demo]

Proximity Distribution Kernels for Geometric Context in Category Recognition, by H. Ling and S. Soatto. CVPR 2007.

[pdf] [PASCAL datasets] [Graz dataset]

Additional code / software:

LIBSVM: library for Support Vector Machines and tool for precomputed kernel matrices

Caltech-101 kernels and results from Anna Bosch and Andrew Zisserman

Part-based models

Object Class Recognition by Unsupervised Scale Invariant Learning, by R. Fergus, P. Perona, and A. Zisserman. CVPR 2003.

[pdf] [datasets]

Combined Object Categorization and Segmentation with an Implicit Shape Model, by B. Leibe, A. Leonardis, and B. Schiele. ECCV Workshop on Statistical Learning in Computer Vision, 2004.

[pdf] [code] [video1] [video2]

A Discriminatively Trained, Multiscale, Deformable Part Model, by P. Felzenszwalb, D. McAllester and D. Ramanan. CVPR 2008.

[pdf] [code]

Additional code / software:

Simple parts-and-structure detector (by Fergus/FeiFei/Torralba) http://people.csail.mit.edu/fergus/iccv2005/partsstructure.html

Image annotation process

LabelMe: a Database and Web-based Tool for Image Annotation. B. Russell, A. Torralba, K. Murphy, and W. Freeman, IJCV 2008.

[pdf] [web]

Peekaboom: A Game for Locating Objects in Images, by L. von Ahn, R. Liu and M. Blum, CHI 2006.

[pdf]

GrabCut -Interactive Foreground Extraction using Iterated Graph Cuts, by C. Rother, V. Kolmogorov, A. Blake, SIGGRAPH 2004.

[pdf] [demo video]

Multi-Level Active Prediction of Useful Image Annotations for Recognition, by S. Vijayanarasimhan and K. Grauman, NIPS 2008.

[pdf]

Additional code / software / demos:

ESP Game and other games, Luis von Ahn et al.

CAPTCHA: Telling Humans and Computers Apart Automatically

II. Surrounding cues

Inferring 3d cues from a single image

Geometric Context from a Single Image, by D. Hoiem, A. Efros, and M. Hebert, ICCV 2005.

[pdf] [web] [code]

Depth Estimation using Monocular and Stereo Cues, by A. Saxena, J. Schulte, and A. Ng. IJCAI 2007.

[pdf] [web]

Inferring Spatial Layout from A Single Image via Depth-Ordered Grouping, by S. Yu and H. Zhang and J. Malik, Workshop on Perceptual Organization in Computer Vision, 2008. [pdf] [slides] [data]

Additional code / software:

Try Labelme’s 3d popup feature

Greg Mori’s superpixels code

Pedro Felzenszwalb’s segmentation code

Hoiem et al. Automatic photo pop-up

Try Stanford Make3D demo

Scene recognition

Modeling the Shape of the Scene: a Holistic Representation of the Spatial Envelope, by A. Oliva and A. Torralba, IJCV 2001.

[pdf] [code]

A Bayesian Hierarchical Model for Learning Natural Scene Categories, by L. Fei-Fei and P. Perona. CVPR 2005.

[pdf]

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, by S. Lazebnik, C. Schmid, and J. Ponce, CVPR 2006.

[pdf] [slides] [dataset] [libpmk_spatial] [Matlab code]

Additional code / software / data:

Scene Understanding Symposium

100 natural scenes from Fei-Fei et al.

13 natural scene categories dataset

David Blei’s topic modeling code

Context

Learning Spatial Context: Using Stuff to Find Things, by G. Heitz and D. Koller, ECCV 2008.

[pdf]

Contextual Priming for Object Detection, by A. Torralba. IJCV, 2003.

[pdf] [web1] [web2]

Object Categorization using Co-Occurrence, Location and Appearance, by C. Galleguillos, A. Rabinovich and S. Belongie, CVPR 2008.

[pdf]

Putting Objects in Perspective, by D. Hoiem, A. Efros, and M. Hebert, CVPR 2006.

[pdf] [web]

Additional code / software / data:

Survey on context in recognition by Galleguillos et al.

Labelme dataset

III. Data-driven visual learning

Leveraging internet data

IM2GPS: Estimating Geographic Information from a Single Image, by J. Hays and A. Efros. CVPR 2008.

[pdf] [web]

80 Million Tiny Images: a Large Dataset for Non-Parametric Object and Scene Recognition. by A. Torralba, R. Fergus, and W. Freeman, PAMI 2008.

[pdf] [web] [Wordnet]

Scene Segmentation Using the Wisdom of Crowds, by I. Simon and S. Seitz. ECCV 2008.

[pdf]

Harvesting Image Databases from the Web, by F. Schroff, A. Criminisi, and A. Zisserman, ICCV 2007.

[pdf]

World-scale Mining of Objects and Events from Community Photo Collections, by T. Quack, B. Leibe, and L. Van Gool, CIVR 2008.

[pdf]

Additional code / data / papers / demos:

Tamara Berg’s Animals on the Web data

Florian Schroff’s page on Harvesting Image Databases from the web

Rob Fergus’s dataset for Learning Object Categories from Google’s Image Search

Code for finding and downloading images on Flickr, by James Hays

Creating and Exploring a Large Photorealistic Virtual Space, Sivic et al.

Semantic Robot Vision Challenge

Text, language, and imagery

“Hello! My name is... Buffy” – Automatic Naming of Characters in TV Video, by M. Everingham, J. Sivic and A. Zisserman, BMVC 2006.

[pdf] [web] [data]

Names and Faces in the News, by T. Berg, A. Berg, J. Edwards, M. Maire, R. White, Y. Teh, E. Learned-Miller and D. Forsyth, CVPR 2004.

[pdf] [web]

Movie/Script: Alignment and Parsing of Video and Text Transcription, by T. Cour, C. Jordan, E. Miltsakaki, and B. Taskar, ECCV 2008.

[pdf] [videos]

Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers, A. Gupta and L. Davis, ECCV 2008.

[pdf]

Additional code / data:

Wordnet

Subrip for subtitle extraction

Reuters captioned photos

Sonal Gupta’s dataset of srports videos with commentary

Face data from Buffy episode, from Oxford Visual Geometry Group

Unsupervised learning and discovery

Discovering Objects and Their Location in Images, by J. Sivic, B. Russell, A. Efros, A. Zisserman, and W. Freeman, ICCV 2005.

[pdf] [web]

Unsupervised Discovery of Action Classes, by Y. Wang, H. Jiang, M. Drew, Z-N. Li and G. Mori, CVPR 2006.

[pdf] [web]

Detecting Irregularities in Images and in Video, by O. Boiman, M. Irani, ICCV 2005.

[pdf] [web]

IV. Searching and browsing visual content

Fast indexing and search

Scalable Recognition with a Vocabulary Tree, by D. Nister and H. Stewenius, CVPR 2006.

[pdf]

Fast Image Search for Learned Metrics. P. Jain, B. Kulis, and K. Grauman, CVPR 2008.

[pdf] [slides]

Efficient Near-Duplicate Detection and Sub-Image Retrieval. Y. Ke, R. Sukthankar, and L. Huston. Multimedia 2004. [pdf]

Additional code / data / references:

Oxford project on object retrieval with vocabulary trees

LSH homepage

LSH Matlab code by Greg Shakhnarovich

Nearest neighbor datasets from Vassilis Athitsos

Electronic copy of the book Nearest Neighbor Methods in Learning and Vision: Theory and Practice (UT EID required)

Searching in Metric Spaces, a survey by Chavez et al. ACM Computing Surveys, Vol. 33, No. 3, September 2001, pp. 273–321

Small Codes and Large Image Databases for Recognition, by Torralba, A. , Fergus, R. and Weiss, Y. CVPR 2008.

[pdf] [slides]

Object Retrieval with Large Vocabularies and Fast Spatial Matching. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, CVPR 2007. [pdf]

Browsing: query refinement and summarization

Nonchronological Video Synopsis and Indexing, by Y. Pritch, A. Rav-Acha, and S. Peleg, TPAMI 2008.

[pdf] [web]

CuZero: Embracing the Frontier of Interactive Visual Search for Informed Users, by E. Zavesky and S-F. Chang, MIR 2008.

[pdf]

Photo Tourism: Exploring Photo Collections in 3D, by N. Snavely, S. Seitz, and R. Szeliski, SIGGRAPH 2006.

[pdf] [web]

Graph-Cut Transducers for Relevance Feedback in Content Based Image Retrieval, by H. Sahbi, J-Y. Audibert, R. Keriven, ICCV 2007. [pdf]

Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs, by X. Li, C. Wu, C. Zach, S. Lazebnik, and J. Frahm, ECCV 2008. [pdf] [web]

Additional references / demos / data:

UW Community Photo Collections Webpage

Survey by Xiang Zhou and Thomas Huang on relevance feedback for CBIR, 2001

Baeza-Yates & Ribeiro-Neto Chapter 5 on query operations