Theory Refinement with Noisy Data (1991)
This paper presents a method for revising an approximate domain theory based on noisy data. The basic idea is to avoid making changes to the theory that account for only a small amount of data. This method is implemented in the EITHER propositional Horn-clause theory revision system. The paper presents empirical results on artificially corrupted data to show that this method successfully prevents over-fitting. In other words, when the data is noisy, performance on novel test data is considerably better than revising the theory to completely fit the data. When the data is not noisy, noise processing causes no significant degradation in performance. Finally, noise processing increases efficiency and decreases the complexity of the resulting theory.
Technical Report AI91-153, Artificial Intelligence Laboratory, University of Texas.

Raymond J. Mooney Faculty mooney [at] cs utexas edu
Dirk Ourston Ph.D. Alumni ourston [at] arlut utexas edu