PhD Defense: Sung Ju Hwang GDC 4.816
PhD Defense: Sung Ju Hwang
Date: July 29, 2013
Time: 10:00 am
Place: GDC 4.816
Research Supervisor: Kristen Grauman
Title of dissertation: Discriminative Object Categorization with External Semantic Knowledge
Abstract:
Visual object category recognition is one of the most challenging problems in computer vision. Even assuming that we can obtain a near-perfect instance level representation with the advances in visual input devices and low-level vision techniques, object categorization still remains as a difficult problem because it requires drawing boundaries between instances in a continuous world, which are solely defined by human conceptualization. Object categorization is essentially a perceptual process that takes place in the human-defined semantic space.
In this semantic space, the categories reside not in isolation, but in relation to others. Some categories are similar, grouped, or co-occur, and some are not. However, despite this semantic nature of object categorization, most of the today's automatic visual category recognition systems rely only on the category labels for training discriminative recognition models, with statistical machine learning techniques. In many cases, this results in the recognition model learning incorrect association between visual features and the semantic labels, essentially overfitting to training set biases and limiting the model's prediction power when new test instances are given.
Using semantic knowledge has great potential to benefit object category recognition as it could guide the training model to learn a correct association between the visual features and categories, by leveraging much richer information that are given in the form of inter-category, or category concept distances and structures. My goal in this thesis is to learn a discriminative model for categorization that leverages this semantic knowledge for object recognition. To this end, I explore three semantic sources, namely attributes, taxonomies, and analogies, which are incorporated into the original discriminative model as a form of structural regularization. The regularization penalizes the models that deviate from the known structures according to the semantic knowledge provided.
The first semantic source I explore is attributes, which are human-describable semantic characteristics of an instance. While the existing work treated them as mid-level features which did not introduce new information, I focus on their potential as a means to better guide the learning of object categories, by enforcing the object category classifiers to share features with attribute classifiers, in a multitask feature learning framework. This approach essentially discovers the common low-dimensional features that support predictions in either semantic space.
Then, I move on to the semantic taxonomy, which is another valuable source of semantic knowledge. The merging and splitting criteria for the categories on a taxonomy depend on human-defined criteria, and I aim to exploit this implicit semantic knowledge. Specifically, I propose a tree of metrics (ToM) that learns metrics that capture granularity-specific similarities at different nodes of a given semantic taxonomy, and uses a regularizer to isolate granularity-specific disjoint features. This approach captures the intuition that the features used for the discrimination of the parent class should be different from the feature used for the children classes. Such learned metrics can be used for hierarchical classification.
The use of a single taxonomy can be limited in that its structure is not optimal for hierarchical classification, and there may exist no single optimal semantic taxonomy that aligns with visual relationships. Thus, I propose a way to overcome this limitation by leveraging multiple taxonomies as semantic sources to exploit, and combine the acquired complementary information at multiple semantic views and granularities. This allows us, for example, to synthesize semantics from both `Biological', and `Appearance'-based taxonomies when learning the visual features.
Finally, as a further exploration of more complex semantic relations different from the previous two pairwise similarity-based models, I exploit analogies, which encode the relational similarities between two related pairs of categories. Specifically, I use analogies to regularize a discriminatively learned semantic embedding space for categorization, such that the displacements between the two category embeddings in both category pairs of the analogy are the same. Such a constraint allows for a more confusing pair of categories to benefit from a clear separation in the matched pair of categories that share the same relation.
All of these methods are evaluated on challenging public datasets, and are shown to effectively improve the recognition accuracy over purely discriminative models. Further, the applicabilities of the proposed methods are not only limited to the use in visual object categorization in computer vision, but also to any classification problems where there exists some domain knowledge about the relationships between the classes. Possible applications include document classification problem in natural language processing, and protein classification problem in computational biology.
- About
- Research
- Faculty
- Awards & Honors
- Undergraduate
- Graduate
- Careers
- Outreach
- Alumni
- UTCS Direct