Semantic Kernel Forests from Multiple Taxonomies

Sung Ju Hwang, Fei Sha and Kristen Grauman
The University of Texas at Austin
University of Southern California


We introduce an approach to learn discriminative visual representations while exploiting external semantic knowledge about object category relationships. Given a hierarchical taxonomy that captures semantic similarity between the objects, we learn a corresponding tree of metrics (ToM). In this tree, we have one metric for each non-leaf node of the object hierarchy, and each metric is responsible for discriminating among its immediate subcategory children. Specifically, a Mahalanobis metric learned for a given node must satisfy the appropriate (dis)similarity constraints generated only among its subtree members training instances. To further exploit the semantics, we introduce a novel regularizer coupling the metrics that prefers a sparse disjoint set of features to be selected for each metric relative to its ancestor (supercategory) metrics. Intuitively, this reflects that visual cues most useful to distinguish the generic classes (e.g., feline vs. canine) should be different than those cues most useful to distinguish their component fine-grained classes (e.g., Persian cat vs. Siamese cat). We validate our approach with multiple image datasets using the WordNet taxonomy, show its advantages over alternative metric learning approaches, and analyze the meaning of attribute features selected by our algorithm


The conventional use of a semantic taxnomy in object categorization is limited in two ways.
1) The structure is not always optimal for hierarchical classification 2) There exists no single 'optimal' taxonomy


We want to focus on the implicit information provided by the taxonomy - that is, the criteria used to classify subclasses at different semantic granularity [1]
Different taxonomies provide complementary information, which could be exploited to learn discriminative visual features. Instead of focusing on a single taxonomy, and use it directly for hierarchical classificaiton, we want to exploit multiple taxonomies for feature learning, and combine the learned features in a non-hierarchical way.


Our method comprises of two steps: 1) Learning granularity-specific semantic discriminative features. 2) Combining features at different semantic views and granularities.

Isolating granularity-specific discriminative features from multiple taxonomies

To learn granularity-specific discriminative features on each taxonomy, we use Tree of Metrics (ToM), that learns hierarchically structured metrics that focus on different features at each node.

Combining features at different semantic views and granularities

After having isolated per-granularity discriminative semantic features at each node, we want to combine them to learn optimal per-category feature.

Semantic Kernel Forest

We first construct RBF kernels on each node, using the metrics obtained from ToM. We call the resulting set of kernels as Semantic Kernel Forest.

Learning per-category kernel from Semantic Kernel Forest using MKL

After isolating semantic, discriminative features at each node, we combine them in an additive manner using multiple kernel learning. Note that we only consider the kernels on the tree path.

Sparse hierarchical regularization

L1 regularization is usally applied to the original MKL formulation, in case not all kernels are relevant and we want to select out useful ones. Semantic kernel forests inevitably result in redundant kernels for categorization between two classes. In this case, we want to favor higher-level kernels, as they have more generalization power.



We validate our method on three different datasets, AWA-4 (4 categories used for illustration), AWA-10, and Imagenet-20.
AWA-10 Imagenet-20
# images 6,180 28,957
# classes 10 20
Granularity Fine-grained Coarser-grained

Multiclass classification

Method Description AWA-4 AWA-10 Imagenet-20
Raw feature kernel an RBF kernel computed on the original features 47.67 ± 2.22 30.80 ± 1.36 28.20 ± 1.45
Raw feature kernel + MKL MKL combination of RBF kernels constructed by varying gamma 48.50 ± 1.89 31.13 ± 2.31 27.57 ± 1.50
Perturbed semantic kernel tree + MKL-H a semantic kernel tree trained with taxonomies that have randomly swapped leaves N/A 31.53 ± 2.07 28.20 ± 2.02
Perturbed semantic kernel forest + MKL-H semantic kernel forest trained with taxonomies that have randomly swapped leaves N/A 33.20 ± 2.96 30.77 ± 1.53
Semantic kernel tree + Avg an equal-weight average of the semantic kernels from one taxonomy 47.17 ± 2.40 31.92 ± 1.21 28.97 ± 1.61
Semantic kernel tree + MKL the same kernels, combined with MKL using sparsity regularization only 48.89 ± 1.06 32.43 ± 1.93 29.74 ± 1.26
Semantic kernel tree + MKL-H the same as the above, but adding the proposed hierarchical regularization 50.06 ± 1.12 32.68 ± 1.79 29.90 ± 0.70
Semantic kernel forest + MKL semantic forest kernels from multiple taxonomies combined with MKL 49.67 ± 1.11 34.60 ± 1.78 30.97 ± 1.14
Semantic kernel forest + MKL-H the same as the above, but adding our hierarchical regularizer 52.83 ± 1.68 35.87 ± 1.22 32.30 ± 1.00

Per-class results

The below two plots are per-class accuracy improvements of each individual taxonomy and the semantic kernel forest ("All") over the raw kernel baseline.
A single semantic kernel tree often improves accuracy on some classes, but at the expense of reduced accuracy on others (e.g. Habitat on AWA-10 helps to distinguish humpback whale, but hurts accuracy for giant panda). On the other hand, semantic kernel forest takes best of both through its learned combination.

Qualitative Analysis

Confusion matrix on 4 animal classes

The top three confusion matrices are obtained using only top-level kernel from each taxonomy, and the bottom is obtained from combining them using MKL. Each taxonomy-derived kernel reduces confusion between some categories but add confusion to others, and the combined kernel reduces confusion among all by taking complementary information from each.

Learnt kernel weights

The above are the weights learned for each category from AWA-10. We see that higher-level kernels are selected using the regularization, which results in an accuracy improvement.


[1] Sung Ju Hwang, Kristen Grauman and Fei Sha, Tree of Metrics with Disjoint Visual Features, NIPS 2011

Source code and data

[kernelforest.tar.gz] 64Mb. Contains matlab codes (v0.9) for both tree of metrics and semantic kernel forests, and data.
v1.0 in C++ with OpenMP for parallel classifier training will be released soon.
[taxonomyutil.tar.gz] Contains matlab codes to generate taxonomy from WordNet, and also taxonomy data for 4 taxonomies from AWA-10, and 3 taxonomies from ImageNet-20


Sung Ju Hwang, Kristen Grauman and Fei Sha Semantic Kernel Forests from Multiple Taxonomies
Advances in Neural Information Processing System (NIPS), Lake Taho, NV, USA, December 2012