CS395T: Visual Recognition and Search

Spring 2008

 

 

Topics

 

 

Visual vocabularies

 

Mining image collections

 

Fast indexing methods

 

Faces

 

Datasets and dataset creation

 

Near-duplicate detection

 

Learning distance functions

 

Place recognition and kidnapped robots

 

Text/speech and images/video

 

Context and background knowledge in recognition

 

Learning about images from keyword-based Web search

 

Video summarization

 

Image and video retargeting

 

Exploring images in 3D

 

Canonical views and visualization

 

Shape matching

 

Detecting abnormal events

 

 

 

 

Visual vocabularies

 

Words are basic tokens in a document of text: they allow us to index documents with a keyword search, or discover topics based on common distributions of words.  What is the analogy for an image?  Visual words are prototypical local features that form a “vocabulary” to generate images.  As with documents, they can be a useful representation.  Various recognition approaches exploit a bag-of-visual-words feature space, identifying the vocabulary words based on some quantization of a sample of local descriptors.  These papers address questions surrounding vocabulary formation, including interest point selection, quantization strategies, and maintaining efficient codebooks.

 

 

  • *Sampling Strategies for Bag-of-Features Image Classification.  E. Nowak, F. Jurie, and B. Triggs.  In Proceedings of the European Conference on Computer Vision (ECCV), 2006.  [pdf]

 

  • Visual Categorization with Bags of Keypoints, by G. Csurka, C. Bray, C. Dance, and L. Fan.  In Workshop on Statistical Learning in Computer Vision, ECCV, 2004.  [pdf]

 

  • Adapted Vocabularies for Generic Visual Categorization, by F. Perronnin, C. Dance, G. Csurka, M. Bressan, in Proceedings of the European Conference on Computer Vision (ECCV), 2006.  [pdf]

 

  • *Fast Discriminative Visual Codebooks using Randomized Clustering Forests, by A. Moosmann, B. Triggs and F. Jurie.  Neural Information Processing Systems (NIPS), 2006.  [pdf]

 

  • Object Categorization by Learned Universal Visual Dictionary.  J. Winn, A. Criminisi and T. Minka.   In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2005.   [pdf]

 

  • Vector Quantizing Feature Space with a Regular Lattice, by T. Tuytelaars and C. Schmid, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007.  [pdf]

 

  • *Scalable Recognition with a Vocabulary Tree, by D. Nister and H. Stewenius, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006.  [pdf]

 

  • Adaptive Vocabulary Forests for Dynamic Indexing and Category Learning, by T. Yeh, J. Lee, and T. Darrell.  In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007.  [pdf]  [web]

 

 

Related links

 

            Executables for interest operators and descriptors, from Oxford VGG

 

Benchmark database from University of Kentucky, used in vocab tree., plus the semiprocessed data.

 

            Libpmk, library from John Lee that includes hierarchical clustering / vocab

 

            Software from LEAR team at INRIA, including interest point detectors, shape features, randomized forest image classifier

           

 

 

---

 

Mining image collections

 

Mining large unstructured collections of images can identify common visual patterns and allow the discovery of topics or even categories.  These papers include methods for clustering according to latent topics and repeated configurations of features, mining for association rules, and playing with large image collections.

 

  • Video Data Mining Using Configurations of Viewpoint Invariant Regions, by Sivic, J. and Zisserman, A. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2004.  [pdf]

 

  • Efficient Mining of Frequent and Distinctive Feature Configurations, by T. Quack, V. Ferrari, B. Leibe, and L. Van Gool, In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007.  [pdf]

 

  • Mining Association Rules Between Sets of Items in Large Databases, by R. Agrawal, T. Imielinski, and A. N. Swami.  In Special Interest Group on Management of Data (SIGMOD), 1993.   [pdf]

 

  • Discovering Objects and Their Location in Images, by J. Sivic, B. Russell, A. Efros, A. Zisserman, and W. Freeman, In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2005.  [pdf] [web]

 

 

  • Mining Image Datasets using Perceptual Association Rules, by J. Tesic, S. Newsam, and B. S. Manjunath.  In SIAM’03 Workshop on Mining Scientific and Engineering Datasets, 2003.  [pdf]

 

 

Related links

           

            pLSA implementations

 

            Matlab code and data for affinity propagation, from Dueck & Frey

 

Weka: Java data mining software, includes implementatin of Apriori algorithm

 

 

---

 

Fast indexing methods

 

Content-based image and video retrieval, as well as example-based recognition systems, require the ability to rapidly search very large image collections.  This area deals with algorithms for fast search, specifically in the context of indexing images or image features.

 

 

  • Scalable Recognition with a Vocabulary Tree, by D. Nister and H. Stewenius, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006.  [pdf]

 

  • *A Binning Scheme for Fast Hard Drive Based Image Search, F. Fraundorfer, H.  Stewenius, and D. Nister, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007.  [pdf]

 

  • *Fast Pose Estimation with Parameter Sensitive Hashing, by G. Shakhnarovich, P. Viola, T. Darrell, In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2003.  [pdf]

 

  • Video Google: A Text Retrieval Approach to Object Matching in Videos, by J. Sivic and A. Zisserman, In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2003.  [pdf]  [web]

 

  • Fast Similarity Search for Learned Metrics.  P. Jain, B. Kulis, and K. Grauman.  UTCS Technical Report #TR-07-48, September, 2007.  

 

  • *Learning Embeddings for Fast Approximate Nearest Neighbor Retrieval.   V. Athitsos, J. Alon, S. Sclaroff, and G. Kollios, Nearest-Neighbor Methods in Learning and Vision: Theory and Practice, G. Shakhnarovich, T. Darrell and P. Indyk, Editors.  MIT Press, March 2006.  [ps]

 

 

Related links

 

LSH homepage, email authors for code package

 

LSH Matlab code by Greg Shakhnarovich

 

            Nearest neighbor datasets from Vassilis Athitsos

 

            Electronic copy of the book Nearest Neighbor Methods in Learning and Vision: Theory and Practice (UT EID required)

 

 

---

 

Faces

 

These papers consider the problems of detecting faces, recognizing familiar faces, and looking for repeated faces in videos.  A variety of techniques are represented below.

 

  • Face Recognition: A Literature Survey, by W. Zhao, R. Chellappa, A. Rosenfeld, and P. Phillips.  In ACM Computing Surveys, 2003. [pdf]

 

  • *Rapid Object Detection Using a Boosted Cascade of Simple Features, by P. Viola and M. Jones, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2001.  [pdf]

 

 

 

o       Active Appearance Models, by T.F.Cootes, G.J. Edwards and C.J.Taylor. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol.23, No.6, pp.681-685, 2001.

 

  • *Automatic Cast Listing in Feature-Length Films with Anisotropic Manifold Space, by Arandjelovic and R. Cipolla, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006.  [pdf]

 

  • Person Spotting: Video Shot Retrieval for Face Sets, J. Sivic, M. Everingham, and A. Zisserman. In International Conference on Image and Video Retrieval (CIVR), 2005.  [pdf]

 

  • Leveraging Archival Video for Building Face Datasets, D. Ramanan, S. Baker, S. Kakade.  In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007.  [pdf]

 

  • Face Recognition by Humans: 19 Results All Computer Vision Researchers Should Know About, by P. Sinha, B. Balas, Y. Ostrovsky, and R. Russell,  Proceedings of the IEEE, Vol. 94, No. 11, November 2006, pp. 1948-1962. [pdf]

 

 

Related links

           

            Intel’s OpenCV library, includes Viola & Jones face detector

           

            Active Appearance Models code from Tim Cootes

 

            Data collections of detected faces, from Oxford VGG

 

Face data from Buffy episode, from Oxford VGG

 

University of Cambridge face data from films [go to Data link]

 

PolarRose.com

 

Pittsburgh Pattern Recognition face detector demo

 

---

 

 

Datasets and dataset creation

 

These papers discuss issues in generating image datasets for recognition research.  Benchmark image datasets allow direct comparisons between various recognition algorithms, and having accessible prepared datasets can be critical for the research itself.  The process of designing an image collection is also important, since the degree of variability can to some degree influence the assumptions made by new methods, or may not adequately show-off their strengths.  Meanwhile, the process of collecting labeled data is expensive and can be tedious.  These papers include novel ways to gather image collections with less pain, and highlight some of the considerations to be made in database design.  *Coverage of this area should include highlights on recent commonly used datasets.*

 

  • Dataset Issues in Object Recognition. by J. Ponce, T.L. Berg, M. Everingham, D.A. Forsyth, M. Hebert, S. Lazebnik, M. Marszalek, C. Schmid, B.C. Russell, A. Torralba, C.K.I. Williams, J. Zhang, and A. Zisserman.  In J. Ponce et al. (Eds.): Toward Category-Level Object Recognition, LNCS 4170, pp. 29–48, 2006.  [pdf]

 

 

 

  • Soylent Grid: it’s Made of People! by S. Steinbach, V. Rabaud and S. Belongie, ICCV workshop on Interactive Computer Vision, 2007.  [pdf]

 

  • Harvesting Image Databases from the Web, by F. Schroff, A. Criminisi, and A. Zisserman, Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007.  [pdf]

 

[No demo on this topic.]

 

Related links

 

            Dataset list with links

 

---

 

Near-duplicate detection

 

This problem involves detecting cases where multiple images (or videos) are the same except for some slight alterations.  Near-duplicate detection can be useful for detecting copyright violations or forged images.  These papers include several vision approaches, as well as some papers on the core algorithms often used.

 

 

  • Efficient Near-Duplicate Detection and Subimage Retrieval, by Yan Ke, Rahul Sukthankar, and Larry Huston, ACM Multimedia 2004.  [pdf]

 

  • Enhancing DPF for Near-replica Image Recognition, by Y. Meng, E. Chang, and B. Li, Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2003. [pdf]

 

  • Content-based Copy Detection using Distortion-Based Probabilistic Similarity Search, by A. Joly, O. Buisson, and C. Frélicot.  In IEEE Transactions on Multimedia, 2007.  [pdf]

 

  • Filtering Image Spam with Near-Duplicate Detection, by Zhe Wang, W. Josephson, Q. Lv, M. Charikar, and K. Li.  Proceedings of the 4th Conference on Email and Anti-Spam (CEAS), 2007. [pdf]

 

  • M. Henzinger. Finding Near-Duplicate Web Pages: a Large-Scale Evaluation of Algorithms. In ACM Special Interest Group on Information Retrieval (SIGIR), 2006.  (text application) [pdf]

 

  • On the Resemblance and Containment of Documents, Andrei Z. Broder, 1997. [pdf]

 

  • Similarity Estimation Techniques from Rounding Algorithms, M. S. Charikar.  In 34th Annual ACMSymposium on Theory of Computing (May 2002).  [ps]

 

  • Scalable Near Identical Image and Shot Detection, by O. Chum, J. Philbin, M. Isard, and A. Zisserman, ACM International Conference on Image and Video Retrieval, 2007. [pdf]

 

 

Related links:

 

            Data from Ke et al. paper

           

            LSH homepage, email authors for code package

 

LSH Matlab code by Greg Shakhnarovich

 

            TRECVID data

 

 

---

 

Learning distance functions

 

The success of any distance-based indexing, clustering, or classification scheme depends critically on the quality of the chosen distance metric, and the extent to which it accurately reflects the true underlying relationships between the examples in a particular data domain. An optimal distance metric should report small distances for examples that are similar in the parameter space of interest (or that share a class label), and large distances for examples that are unrelated.  These papers consider distance learning specifically for image retrieval tasks.

 

  • Learning Distance Functions for Image Retrieval, by T. Hertz, A. Bar-Hillel and D. Weinshall, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2004.  [pdf]

 

  • Learning a Mahalanobis Metric from Equivalence Constraints, by A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall, in Journal of Machine Learning Research (JMLR), 2005.  [pdf]

 

  • *Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification, by A. Frome, Y. Singer, F. Sha, J. Malik, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007.  [pdf]  [web]

 

  • *Invariant Large Margin Nearest Neighbor Classifier, by P. Mudigonda, P. Torr, and A. Zisserman, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007.  [pdf]

 

  • Fast Pose Estimation with Parameter Sensitive Hashing, by G. Shakhnarovich, P. Viola, and T. Darrell, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2003.  [pdf]

 

 

 

Related links:

 

            DistBoost code, Hertz et al.

Relevant Components Analysis code, Hertz et al.

 

            DistLearn toolkit

           

            Large Margin Nearest Neighbors code by Weinberger et al.

 

            Nearest neighbor datasets from Vassilis Athitsos

 

---

 

Place recognition and kidnapped robots

 

How can an image of the current scene allow localization or place recognition?  Or, put more dramatically, how can a kidnapped robot that is carried off to an arbitrary location figure out where it is with no prior knowledge of its position?  These papers address this problem, some specifically with a robotics slant, and some in terms of the image-based scene matching problem.

 

  • *Vision-Based Global Localization and Mapping for Mobile Robots, Se, S., Lowe, D., & Little, J.  IEEE Transactions on Robotics, 2005.  [pdf]

 

  • Image-Based Localisation, by R. Cipolla, D. Robertson and B. Tordoff.  Proceedings of the10th International Conference on Virtual Systems and Multimedia, 2004.  [pdf]

 

  • *Qualitative Image Based Localization in Indoors Environments, by J. Kosecka, L. Zhou, P. Barber, and Z. Duric, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2003. [pdf]

 

  • Location Recognition and Global Localization Based on Scale-Invariant Keypoints, by J. Kosecka and X. Yang,  CVPR workshop 2004.  [pdf]

 

  • Searching the Web with Mobile Images for Location Recognition, T. Yeh, K. Tollmar, and T. Darrell, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2004.  [pdf]

 

  • Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval, by O. Chum, J. Philbin, J. Sivic, M. Isard, A. Zisserman, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007.  [pdf]

 

 

Related links:

           

            Oxford buildings dataset

 

---

 

Text and speech + images and video

 

Often images or videos are accompanied by text or speech, which may provide complementary cues when we are trying to index, cluster, or recognize objects.  These papers seek to leverage this cue in a number of different ways.

 

  • *“Hello! My name is... Buffy” – Automatic Naming of Characters in TV Video, by M. Everingham, J. Sivic and A. Zisserman, British Machine Vision Conference (BMVC), 2006.  [pdf]

 

  • *Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary, P. Duygulu, K. Barnard, N. de Freitas, and D. Forsyth, in Proceedings of the European Conference on Computer Vision (ECCV), 2002.  [pdf]  [web]

 

  • Names and Faces in the News, by T. Berg, A. Berg, J. Edwards, M. Maire, R. White, Y. Teh, E. Learned-Miller and D. Forsyth, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2004.  [pdf]  [web]

 

  • Learning Structured Appearance Models from Captioned Images of Cluttered Scenes, by M. Jamieson A. Fazly, S. Dickinson, S. Stevenson, S. Wachsmuth.  In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007. [pdf]

 

  • Clustering Web Images with Multi-modal Features, by M. Rege, M. Dong, and J. Hua, ACM Multimedia 2007.  [pdf]

 

 

 

Related links:

           

Face data from Buffy episode, from Oxford Visual Geometry Group

 

Data from Duygulu et al. paper

 

Subrip for subtitle extraction

           

---

 

Context and background knowledge in recognition

 

Many recognition systems consider snapshots of objects in isolation, both when training and testing.  But both our intuition and cognitive studies indicate that the object’s greater context can also be crucial to the recognition process.  These papers consider how prior external knowledge can aid in recognizing objects or categories.  The context cues may come from reasoning explicitly about the 3d environment, knowing something about the patterns of a user, learning about the typical patterns of occurrence, or gleaning knowledge from an organized ontology.

 

  • *Putting Objects in Perspective, by D. Hoiem, A.A. Efros, and M. Hebert, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006.  [pdf] [web]

 

  • Objects in Context, by A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, S. Belongie, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007.  [pdf]

 

  • Visual Contextual Awareness in Wearable Computing, by T. Starner, B. Schiele, and A. Pentland.  In Proceedings of Visual Contextual Awareness in Wearable Computing, 1998.  [pdf]  [web]

 

  • *Contextual Priming for Object Detection, by A. Torralba.  International Journal of Computer Vision, 2003.  [pdf]  [web] [web]

 

  • The Role of Context in Object Recognition, by A. Oliva and A. Torralba. TRENDS in Cognitive Sciences, Vol 11 No 12, 2007.  [pdf] 

 

 

 

  • Unsupervised Learning of Hierarchical Semantics of Objects, by D. Parikh and T. Chen, in Proceedings of the International Conference on Computer Vision (ICCV), 2007.  [pdf] [web]

 

 

Related links:

 

            WordNet

           

            Scene global feature code from Antonio Torralba

 

            MIT CSAIL database of objects and scenes

           

---

 

Learning about images from keyword-based Web search

 

Keyword-based search on the Web can be used to retrieve images (or videos) that appear near the query word, are named with the word, or are explicitly tagged with it.  Of course, this is not a completely reliable way to find images of a given object or scene, and typically an image contains much more information than can be conveyed in a few words anyhow.  Yet search engines’ rapid access to large amounts of image/video content make them an interesting resource for vision research.  These papers all consider ways to learn from the images that come back from a keyword-based search, taking into account the large amount of noise in the returns.

 

  • *Learning Color Names from Real-World Images, by J. van de Weijer, C. Schmid, J. Verbeek, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007.  [pdf]

 

  • Searching the Web with Mobile Images for Location Recognition, T. Yeh, K. Tollmar, and T. Darrell, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2004.  [pdf]

 

  • *Learning Object Categories from Google’s Image Search, by R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2005.  [pdf]  [web]

 

  • Animals on the Web, by T. Berg and D. Forsyth, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006.  [pdf]

 

  • Keywords to Visual Categories: Multiple-Instance Learning for Weakly Supervised Object Categorization, by S. Vijayanarasimhan and K. Grauman, UTCS Tech report, 2007.  [pdf]

 

  • Harvesting Image Databases from the Web, by F. Schroff, A. Criminisi, and A. Zisserman, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007.  [pdf]

 

  • Probabilistic Web Image Gathering, by K. Yanai and K. Barnard, in ACM Multimedia 2005.  [pdf]

 

 

Related links:

           

            Animals on the Web data from Berg et al.

 

            Annotated Google image data from Schroff et al. paper

           

            Color name datasets from van de Weijer et al. and feature code

 

            Google image data from Fergus et al.

 

            Flickr Commons, Library of Congress pilot project

 

            Semantic robot vision challenge  and  example data

 

---

 

Video summarization

 

How can a video be compactly presented in a visual way?  Video summarization methods attempt to abstract the main occurrences, scenes, or objects in a clip in order to provide an easily interpreted synopsis.

 

 

 

  • Video Abstraction, by J. Oh, Q. Wen, J. lee, and S. Hwang.  In S. Deb, editor, Video Data Mangement and Information Retrieval, Idea Group Inc. and IRM Press, 2004.  [pdf]

 

  • Shapetime Photography, by W. T Freeman and H. Zhang, in Proceedings IEEE Computer Vision and Pattern Recognition (CVPR), 2003.  [pdf]

 

  • Video Summaries through Mosaic-Based Shot and Scene Clustering, A. Aner and J. Kender, in Proceedings of the European Conference on Computer Vision (ECCV), 2002.  [pdf]

 

  • Dynamic Stills and Clip Trailers, by Y. Caspi, A. Axelrod, Y. Matsushita, A. Gamliel.  [pdf] [web]

 

  • Reliable Transition Detection in Videos: A Survey and Practitioner’s Guide, by R. Lienhart, International Journal of Image and Graphics, 2001.  [pdf]

 

  • Recent Advances in Content-based Video Analysis.  C Ngo, H. Zhang, and T. Pong.  International Journal of Image and Graphics, 2001. [pdf]

 

---

 

Image and video retargeting

 

These papers cover both content-aware resizing as well as texture synthesis.  The general idea is to automate (or semi-automate) the process of adapting image or video inputs to a desired format, whether that’s so it can be viewed well on a different display size, or so it can be viewed continuously as if the regular spatial or temporal pattern persists beyond where it ends in the raw input.  The challenges include adapting the input in such a way that the most interesting parts are preserved or well-represented, and in the case of textures, generating processes that look realistically stochastic and “natural”.

 

o       Image Quilting for Texture Synthesis and Transfer, by A. Efros and W. Freeman, in ACM Transactions on Graphics (SIGGRAPH), 2001. [pdf] [web]

 

  • Fast Texture Synthesis using Tree-structured Vector Quantization, by L.Wei  and M. Levoy, in ACM Transactions on Graphics (SIGGRAPH), 2000.  [pdf] [web]

 

 

 

  • Automatic Thumbnail Cropping and its Effectiveness, by B. Suh, H. Ling, B. Bederson, and D. Jacobs.  In Proceedings of the Symposium On User interface Software and Technology, 2003.  [pdf]

 

  • *Non-homogeneous Content-driven Video-retargeting, by L. Wolf, M. Guttmann, D. Cohen-Or: In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007.  [pdf] [web]

 

  • Video Retargeting: Automating Pan and Scan, by F. Liu and M. Gleicher, in ACM Multimedia, 2006.  [pdf]

 

 

---

 

Exploring images in 3D

 

From multiple views of a scene we can create 3D representations or new renderings.  These papers propose ways to explore image content in 3D, with an emphasis on applications of doing so, such as perusing popular tourist sites from multiple users’ photos, analyzing the geometry of paintings, or editing photos based on their layers.  Some methods included here are semi-automatic.

 

 

 

  • Tour into the Picture: Using a Spidery Mesh Interface to Make Animation from a Single Image, by Y. Horry, K. Anjyo, and K. Arai. ACM Transactions on Graphics (SIGGRAPH), 1997.  [pdf]

 

  • Automatic Photo-Popup, by D. Hoiem, A. Efros, and M. Hebert, ACM Transactions on Graphics (SIGGRAPH), 2005.  [pdf] [web]

 

  • *Single-View Metrology: Algorithms and Applications, by A. Criminisi, DAGM, 2002.  [pdf] [web]

 

  • Single View Metrology, A. Criminisi, I. Reid, A. Zisserman, International Journal of Computer Vision, 1999.  [pdf]

 

  • Image-Based Modeling and Photo Editing, by B. Oh, M. Chen, J. Dorsey, and F. Durand, ACM Transactions on Graphics (SIGGRAPH), 2001. [pdf]

 

 

Related links:

           

            Some PhotoTourism patch data from Microsoft Research

 

---

 

Canonical views and visualization

 

Given an object or scene, what sparse set of viewpoints best summarize it?  This problem is in some ways related to the video summarization topic (see above), but here with an emphasis on the visualization of photo collections, and with some consideration of optimizing for human perception.

 

  • Scene Summarization for Online Image Collections, by I. Simon, N. Snavely, and S. Seitz.  In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007 [pdf] [web]

 

  • Generating Summaries for Large Collections of Geo-referenced Photographs, by A. Jaffe, M. Naaman, T. Tassa, and M. Davis. International Conference on World Wide Web, 2006. [pdf]

 

  • Approximation of Canonical Sets and Their Applications to 2D View Simplification, by T. Denton, J. Abrahamson, A. Shokoufandeh, in Proceedings IEEE Computer Vision and Pattern Recognition (CVPR), 2004.  [pdf]

 

  • What Object Attributes Determine Canonical Views?  V. Blanz, M. Tarr, H. Bulthoff.  Perception, 28(5):575-600, 1999.  [pdf]

 

  • Digital Tapestry, by C. Rother, S. Kumar, V. Kolmogorov, and A. Blake, in Proceedings IEEE Computer Vision and Pattern Recognition (CVPR), 2005.  [pdf] [web]

 

  • Picture Collage, by J. Wang, J. Sun, L. Quan, X. Tang,  and H. Shum, in Proceedings IEEE Computer Vision and Pattern Recognition (CVPR), 2006.  [pdf]

 

 

---

 

Shape matching

 

The shape matching problem considers how to compare shapes, often as defined in terms of their contours, silhouettes, or sampled edge points.  These papers provide different matching metrics and demonstrate the use of shape for applications like object recognition, reading warped text, detecting pedestrians, and categorization.

 

  • *Shape Matching and Object Recognition Using Shape Contexts, by S. Belongie, J. Malik, and J. Puzicha. Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2002.  [pdf] [web]

 

  • Recognizing Objects in Adversarial Clutter: Breaking a Visual CAPTCHA, by G. Mori and J. Malik, in Proceedings IEEE Computer Vision and Pattern Recognition (CVPR), 2003.  [pdf] [web]

 

  • Using the Inner-Distance for Classification of Articulated Shapes, by H. Ling and D. Jacobs, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005.  [pdf]

 

  • Comparing Images Using the Hausdorff Distance, by D. Huttenlocher, G. Klanderman, and W. Rucklidge, Transactions on Pattern Analysis and Machine Intelligence (PAMI), 1993.  [pdf]

 

  • Pedestrian Detection from a Moving Vehicle, by D. Gavrila, Proceedings of the European Conference on Computer Vision (ECCV), 2000.  [pdf]

 

  • *A Boundary-Fragment-Model for Object Detection, by A. Opelt, A. Pinz, and A. Zisserman, Proceedings of the European Conference on Computer Vision (ECCV), 2006.  [pdf]

 

  • Hierarchical Matching of Deformable Shapes, by P. Felzenszwalb and J. Schwartz, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007.  [pdf]

 

 

Related links:

 

            Matlab code for shape context features and matching

 

            MNIST handwritten digits database

 

---

 

Detecting abnormal events

 

It would be useful if a vision system that monitors video could automatically determine when something “unusual” is happening.  But how can a system be trained to recognize something it has never (or rarely) seen before?  These techniques address the problem of detecting visual anomalies.

 

 

 

 

 

 

---