Evaluating our Approach of Building Representations from Reusable Components

As described in Clark and Porter's AAAI-97 paper, our approach to building computational representations of complex phenomena is to assemble and customize a set of reusable, generic components. We propose to evaluate our approach beginning in the first year of our project, even before finishing our core technology of:

We will design the evaluation to provide early feedback based on our preliminary results.

There are three claims about our approach that we propose to evaluate. First, we define a component as a set of axioms and claim that each of our components describes a consensus view of a common concept, as opposed to a view that is idiosyncratic to our project. Second, we claim that our components can be frequently reused, with little modification, to build knowledge bases. This claim is unjustified if we find that components are infrequently reused, or their reuse requires significant modification. Third, we claim that our components can be assembled to represent domain knowledge well. In contrast, if we find that much of the information coded in a knowledge base is not supplied by components, or that the component assemblies are incomplete or inconsistent, then the claim is unjustified.

We will evaluate these claims while building a component library for both generic concepts and concepts related to microbiology (the domain for the Textbook Knowledge Challenge Problem, or TKCP).

We propose to evaluate the first claim (that our components each represent a consensus view) in the following way. First, to avoid having to train SME to understand our formal language, we will translate the content of our components (i.e. the axioms) into English, yielding descriptions of each one. Next, we will have human subjects name the concept that most closely fits each description. For example, the component for Locomotion is defined by two axioms, which would translate as: "a type of Move in which the agent and object are the same Tangible-Entity." The subjects would be given this description, plus the slot dictionary, which defines agent, object, and so on.

We propose to use Wordnet to measure the "distance" between the term used in the component library (e.g. "Locomotion") and the term assigned by each subject. From this feedback we will determine the degree to which our axiomatization of components matches people's commonsense view of the concepts.

Finally, we will present each subject with the "correct" term, as used in the component library, and invite him/her to revise the description so that it better matches the term. For example, a subject might add or remove a phrase from the description of Locomotion, which would correspond to changing the set of axioms. From this feedback, we will improve our axiomatization of components.

We propose to evaluate the second claim (that our components can be frequently reused with little modification) by measuring the amount of reuse, both within the component library and within knowledge representations built by Knowledge Engineers. Furthermore, we will count the number of modifications made for each instance of reuse, and weight it by its severity. The reuse metrics developed for the HPKB project should be appropriate here, too. (Note: we considered measuring resuse in knowledge representations built by SME's, rather than by KE's. Clearly, this is paramount in the evaluation of the End-to-End systems built by the RKF project. However, the evaluation we are describing here focuses on just one technology - Components - and will ascertain the potential for reuse when the technology is used well. This will provide a useful target (or baseline) for evaluating the ability of SME's to build knowledge bases down the road.)

We propose to evalute the third claim (that our components can be assembled to represent domain knowledge well) in the following way. For each topic represented in a knowledge base built by an SME, measure how much of its representation was provided by our components; the balance was coded specifically for this topic. Furthermore, note that the proposed evaluation of the TKCP will measure the quality of the overall representation, in terms of consistency, completeness, support for inference, and so on. In that context, we propose to analyze the KB's successes and failures at the TKCP to assign responsibility to our components (as opposed to other aspects of the KB system). In particular, we are interested to see whether there's a tradeoff between the quality of a knowledge representation and the degree to which it's built by reuse.

Back to RKF Group Home Page Created and maintained by Bruce Porter.
Last modified September 8, 2000.