Learning a Tree of Metrics
with Disjoint Visual Features

Sung Ju Hwang, Kristen Grauman and Fei Sha
The University of Texas at Austin
University of Southern California


We introduce an approach to learn discriminative visual representations while exploiting external semantic knowledge about object category relationships. Given a hierarchical taxonomy that captures semantic similarity between the objects, we learn a corresponding tree of metrics (ToM). In this tree, we have one metric for each non-leaf node of the object hierarchy, and each metric is responsible for discriminating among its immediate subcategory children. Specifically, a Mahalanobis metric learned for a given node must satisfy the appropriate (dis)similarity constraints generated only among its subtree members training instances. To further exploit the semantics, we introduce a novel regularizer coupling the metrics that prefers a sparse disjoint set of features to be selected for each metric relative to its ancestor (supercategory) metrics. Intuitively, this reflects that visual cues most useful to distinguish the generic classes (e.g., feline vs. canine) should be different than those cues most useful to distinguish their component fine-grained classes (e.g., Persian cat vs. Siamese cat). We validate our approach with multiple image datasets using the WordNet taxonomy, show its advantages over alternative metric learning approaches, and analyze the meaning of attribute features selected by our algorithm.


A semantic taxonomy is a generalization-specialization relation graph among categories, that can related each visual feature with different semantic granularity.


Leveraging parent-child relationships in a given semantic taxonomy, we learn a tree of metrics (ToM) that captures compact, discriminative visual features for each level.


Tree of Metrics (ToM)

Given a tree T, we want to learn a metric M_t for each internal (superclass) node to discriminate between its subclasses. -> minimize a distance metric learning objective with large-margin constraints [1].

Regularization terms to learn compact discriminative metrics

Local level (sparsity regularization) Global level (disjoint regularization)
Apply trace-norm based regularization at each node. Promotes competition between features in a single metric. Apply disjoint regularization between nodes. Promotes competition for the features between metrics.

Optimization problem

The above optimization problem is convex, but nonsmooth due to the large margin constraints. We optimize it using a subgradient solver similar to the one in [2].

Classification with Tree of Metrics

Per-node classification: apply k-nearest neighbor classification on the learned metrics. Hierarchical classification: perform series of per-node classification to determine the next node, recursively from the root to the leaf.

Proof of Concept

Synthetic dataset & category hierarchy ToM ToM + sparsity Tom + disjoint
ToM+disjoint regularization achieves the best performance by selecting granularity-specific features for super- and sub- classes.

Visual recognition experiments


Animals with Attributes (AWA) Imagenet Vehicle
~30K images ~26K images
50 classes 20 classes

Hierarchical multi-class classification accuracy

AWA-ATTR (Predicted Attributes) Vehicle-20 (PCA projected)
Method Correct label Semantic similarity Correct label Semantic similarity
Euclidean 32.4 53.6 28.5 56.1
Global LMNN [1] 32.5 53.9 29.7 53.6
Multi-metric LMNN [1] 32.3 53.7 30.0 57.9
ToM w/o disjoint sparsity 36.8 58.4 31.2 60.7
ToM + sparsity 37.6 59.3 32.1 62.7
ToM + disjoint 38.3 59.7 32.8 63.0
The above table shows the multiclass classification accuracy averaged over 5 random splits (60/20/20 for training/valiation/test). All our hierarchical metric learning methods outperform the flat-metric baselines, and using disjoint regularization further improves the accuracy over non-regularized version.

Attributes selected from the AWA-ATTR dataset

Attributes useful for coarser-level categories are distinct from those employed to discriminate the finer-level classes.


Comparison with orthogonal transfer [3].
Orthogonal Transfer Tree of Metrics
Orthogonality does not necessarily imply disjoint features Learns true disjoint features
The convexity of the regularizer depends critically on tuning the weight matrix K Regularizers are convex


[1] K. Q. Weinberger, J. Blitzer and L. K. Saul, Distance Metric Learning for Large Margin Nearest Neighbor Classification, NIPS, 2006
[2] Y. Ying, K. Huang and C. Campbell, Sparse Metric Learning via Smooth Optimization, NIPS, 2009
[3] D. Zhou, L. Xiao and M. Wu, Hierarchical Classification via Orthogonal Transfer, ICML, 2011

Source codes and data

[tom.tar.gz] (183Mb) (v0.9) Contains matlab codes for ToM, the data, and other utilities for taxonomies
C++ implementation with matlab interface (v1.0) will be released soon.


Learning a Tree of Metrics with Disjoint Visual Features
Sung Ju Hwang, Fei Sha and Kristen Grauman
Advances in Neural Information Processing System (NIPS),
Granada, Spain, December 2011