Using Explanation-Based and Empirical Methods in Theory Revision (1991)

The knowledge acquisition problem is a continuing problem in expert system development. The knowledge base (domain theory) initially formulated by the expert is usually only an approximation to the correct theory for the application domain. This initial knowledge base must be refined (usually manually) as problems are discovered. This research addresses the knowledge base refinement problem for classification tasks. The research provides an automatic method for correcting a domain theory in the light of incorrect performance on a set of training examples. The method uses attempted explanations to focus the correction on the failing part of the knowledge base. It then uses induction to supply a correction to the knoweledge base that will render it consistent with the training examples

Using this technique, it is possible to correct overly general and overly specific theories, theories with multiple faults at various levels in the theory hierarchy, and theories involving multiple concepts. Methods have been developed for making corrections even in the presence of noisy data. Theoretical justification for the method is given in the form of convergence results that predict that the method will eventually converge to a hypothesis that is within a small error of the correct hypothesis, given sufficient examples. Because the technique currently relies on theorem proving for much of the analysis, it is quite expensive to computationally and heuristic methods for reducing the computational burden have been implemented.

The system developed as part of the research is called EITHER (Explanation-based Inductive THeory Extension and Revision). EITHER uses propositional Horn clause logic as its knowledge representation, with examples expressed as attribute-value lists .The system has been tested in a variety of domains including revising a theory for the identification of promoters in DNA sequences and a theory for soybean disease diagnosis, where it has been shown to outperform a purely inductive approach.

PhD Thesis, Department of Computer Science, University of Texas at Austin.

Dirk Ourston Ph.D. Alumni ourston [at] arlut utexas edu