Image Tags

Accounting for the Relative Importance of Objects in Image Retrieval - BMVC 2010

We introduce a method for image retrieval that leverages the implicit information about object importance conveyed by the list of keyword tags a person supplies for an image. We propose an unsupervised learning procedure based on Kernel Canonical Correlation Analysis that discovers the relationship between how humans tag images (e.g., the order in which words are mentioned) and the relative importance of objects and their layout in the scene. Using this discovered connection, we show how to boost accuracy for novel queries, such that the search results may more closely match the user's mental image of the scene being sought.

Reading Between the Lines: Object Localization Using Implicit Cues from Image Tags - CVPR 2010

Current uses of tagged images typically exploit only the most explicit information: the link between the nouns named and the objects present somewhere in the image. We propose to leverage "unspoken" cues that rest within an ordered list of image tags so as to improve object localization. We define three novel implicit features from an image's tags--the relative prominence of each object as signified by its order of mention, the scale constraints implied by unnamed objects, and the loose spatial links hinted by the proximity of names on the list. By learning a conditional density over the localization parameters (position and scale) given these cues, we show how to improve both accuracy and efficiency when detecting the tagged objects.

Discriminative Learning with Semantic Regularization

Sharing Features Between Objects and Their Attributes - CVPR 2011

Visual attributes expose human-defined semantics to object recognition models, but existing work largely restricts their influence to mid-level cues during classifier training. Rather than treat attributes as intermediate features, we consider how learning visual properties in concert with object categories can regularize the models for both. Given a low-level visual feature space together with attribute and object-labeled image data, we learn a shared lower-dimensional representation by optimizing a joint loss function that favors common sparsity patterns across both types of prediction tasks. We use a kernelized formulation of convex multi-task feature learning, in which one alternates between learning the common features and learning task-specific classifier parameters on top of those features.

Learning a Tree of Metrics with Disjoint Visual Features - NIPS 2011

We introduce an approach to learn discriminative visual representations while exploiting external semantic knowledge about object category relationships. Given a hierarchical taxonomy that captures semantic similarity between the objects, we learn a corresponding tree of metrics (ToM). In this tree, we have one metric for each non-leaf node of the object hierarchy, and each metric is responsible for discriminating among its immediate subcategory children. Specifically, a Mahalanobis metric learned for a given node must satisfy the appropriate (dis)similarity constraints generated only among its subtree members training instances. To further exploit the semantics, we introduce a novel regularizer coupling the metrics that prefers a sparse disjoint set of features to be selected for each metric relative to its ancestor (supercategory) metrics.

Semantic Kernel Forests from Multiple Taxonomies - NIPS 2012

When learning features for complex visual recognition problems, labeled image exemplars alone can be insufficient. While an object taxonomy specifying the categories' semantic relationships could bolster the learning process, not all relationships are relevant to a given visual classification task, nor does a single taxonomy capture all ties that are relevant. In light of these issues, we propose a discriminative feature learning approach that leverages multiple hierarchical taxonomies representing different semantic views of the object categories. For each taxonomy, we first learn a tree of semantic kernels, where each node has a Mahalanobis kernel optimized to distinguish between the classes in its children nodes. Then, using the resulting semantic kernel forest, we learn class-specific kernel combinations to select only those relationships relevant to recognize each object class. Further, to learn the weights, we introduce a novel hierarchical regularization term that further exploits the taxonomies' structure.

Automatic Image Enhancement

Context-Based Local Image Enhancement